Lesson 11

Using Design Documents to Create Views in CouchDB

Querying data in CouchDB

PRO

Lesson Outline

Using Design Documents to Create Views in CouchDB

At this point, we have created our CouchDB database and we have added two documents to it: one notice document and one chat document. If you haven't completed the lesson on adding data using Fauxton, you should make sure that the following two documents have been added to your CouchDB database:

{
  "type": "notice",
  "title": "the title",
  "message": "the message",
  "author": "Josh",
  "dateCreated": "2021-04-27T12:24:33.6992",
  "dateUpdated": "2021-04-27T12:24:33.6992"
}
{
  "type": "chat",
  "message": "the message",
  "author": "Josh",
  "dateCreated": "2021-04-27T12:24:33.6992"
}

NOTE: Make sure you keep the auto generated _id field in Fauxton.

Over time we will be adding lots of chat and notice documents to the database.

Getting documents into the database from the application is easy enough. We haven't covered that specifically yet, but if you want to add a document to the database the basic idea is that you send the document to the database and it gets added... this is probably what you would expect and nothing is going to surprise you when we get to that.

Getting data from the database is a bit different, though. We rarely want to just pull all the data from the database at once, we want to work with a specific set of data. Maybe we want to grab all the notices, but not the chats. Or maybe we want to only grab certain notices, maybe just ones that were authored by joshmorony.

In this lesson, we are going to cover how to query data in CouchDB and how it is different to a more traditional relational style approach.

Querying Data

This is probably one of the bigger conceptual pain points for CouchDB (and document based databases in general) – if people have data they generally want to perform operations like:

  • Get all comments that belong to this post
  • Return a count of the total number of comments for this post
  • Get all products from this user's shopping cart

These types of requests will allow an application to do what it needs to do, but people may also want to retrieve information for the purpose of reports or analytics, like:

  • Get all users who have signed up in the past week
  • Get users who have not posted in the last 60 days
  • Get the total number of posts

The way in which one would run these sorts of queries against a relational database is pretty well established, e.g:

SELECT * FROM users WHERE signup_date > CURDATE() - INTERVAL 7 DAY

or perhaps you may also need to join some tables together using foreign keys. It's reasonably easy to look at that query and see what the query would be doing. It almost reads perfectly as a direct translation into English "Select all users where the signup date is greater than the current date minus 7 days".

Even some NoSQL databases (PouchDB included, which we will discuss in just a little bit) provide their own natural language like query structures, MongoDB, for example, provides operators like $gt, $or, and so on:

db.inventory.find({
  status: "A",
  qty: { $lt: 30 },
});

The above query would translate to:

"Find all items in the inventory that have a status of A and a quantity of less than 30".

It's not as immediately obvious as the SQL query, but it's the same basic idea.

Relational databases have been the de facto database for a long time, so this stuff is ingrained into a lot of developers minds, and transitioning to a NoSQL database that uses a similar approach might seem less intimidating.

Querying data with CouchDB is a bit of a paradigm shift, and will take some getting used to, but if you could wipe all knowledge of databases from your mind I think that CouchDB's method of retrieving data would actually be quite intuitive to learn (especially if you are already familiar with programming in Javascript). We will cover this in depth in just a moment, but the basic idea is that we use a MapReduce process to create views - if you are familiar with mapping and reducing arrays in JavaScript then this will likely slot into your existing mental models quite nicely.

Retrieving Data with Mango

If the concept of querying data with MapReduce is really not your cup of tea, you can use PouchDB's Mango query language which is very similar to the example MongoDB query I showed you. We won't be covering it in this module, but you can view the documentation here. The basic idea is that you build a "secondary index" and then you can perform queries against that index. By default, everything is just indexed by its _id field in CouchDB and we can use that field to retrieve documents. If you want to use Mango queries you will first build another "secondary" index on another field like this:

db.createIndex({
  index: {fields: ['name']}
});

and once an index has been created you now have the means to execute a query against this index with Mango queries:

db.find({
  selector: {
    name: {$eq: 'mario'}
  }
});

If you are unfamiliar with the term "index" in relation to databases, you can think of it as an efficient data structure for looking up records/documents. In the above example, if I want to find documents where the name field is mario I would have to scan every single document in the database and check the name field. This is not an efficient way to perform this task (especially if we have millions of records).

Fortunately, we have fancy data structures that can make lookup tasks like this a lot more efficient. CouchDB uses a structure called a B-tree where we have "leaf" nodes that connect to other nodes via "branches". Here is a diagram from the CouchDB documentation:

B-tree

The important aspect of these data structures is that a logical structure is maintained that makes the lookups more efficient. In this example, we can see that the root nodes are 7, 11, and 15. If we want to find nodes with a value of 4 we only need to traverse the branches from the first node of 7, as this branch will contain all nodes that have a value that is below 7. This allows us to just ignore huge portions of the database when looking up values, because we can be confident that those values do not exist in those parts of the database.

The downside to indexes is that they need to be maintained and written to the disk. If a new node is added then the index needs to be updated to reflect that. You probably don't want to just go around creating indexes for every possible field or combination of fields, but if you are frequently executing particular queries against particular fields then it can make a lot of sense to create an index. Aside from the efficiency aspects of indexes, it is also necessary to create them in order to enable querying data with Mango.

All of this is just to give you context as to what is going on with indexes, you don't actually need to know anything about B-trees or other data structures for creating indexes. These are all handled internally.

PRO

Thanks for checking out the preview of this lesson!

The full version of this lesson is only available to pro members. If you would like full access to this module and all of the other pro modules on Elite Ionic you can become a pro member (or log in if you are already a member).