Lesson 3

A Brief Introduction to NoSQL

An introduction to the basics of NoSQL databases

PRO

Lesson Outline

A Brief Introduction to NoSQL

In this module we are going to be using a CouchDB database to store and retrieve data. CouchDB is a NoSQL database, so we are going to first cover what exactly a "NoSQL" database is.

The name itself is often interpreted as meaning either "No SQL", as in, it does not implement a structured query language, or "Not Only SQL" which implies that it is not limited to a structured query language (the former being closer to the original intent of the name). I'm not really partial to either of these terms, as I think something like "Not Relational" better describes what the term implies today.

A NoSQL database is a database that does not use a relational structure. Relational databases are what developers who have worked with databases in the not too distant past would typically be familiar with, these are databases like MySQL, PostgreSQL, and MSSQL that store data in tables that are related to each other.

Before we get into talking about NoSQL databases, let's first cover what a typical relational database looks like.

Relational Databases

In a relational database, data is stored in a set of tables that are made of of rows and columns. A Structured Query Language that consists of keywords like SELECT, FROM, WHERE, and JOIN can be used to query these tables for information.

Tables in a relational database have a pre-defined "schema". A schema defines the structure of the tables, and the type of data that will be stored in them. In MySQL, for example, you might define a "schema" for a table like this:

CREATE TABLE Cars (
    id INT(6) AUTO_INCREMENT PRIMARY KEY,
    make VARCHAR(30),
    model VARCHAR(30),
    year INT(6),
    purchased DATETIME
)

We're not storing any data here, we're telling MySQL how we want to store our data. When inserting some data into this database (a row in the table) we must supply the make and model as a string, the year as an integer, and the date the car was purchased in the DATETIME format. After adding some data to the Cars table it might look like this:

MySQL Example

If we then also wanted to store some additional information, like the engine type of the car, we would first need to modify our schema to include that (we defined a set of rules intially, and we can't break those rules). However, not many relational databases have just a single table, the reason we refer to them as "relational" databases is because data in one table can be "related" to data in another table.

We might also have a Customers table with information about the customers, and those customers may own one or more of the cars. This structure might look something like this:

Customers Database Example

We can tie information together using the id in another table, by doing this we are able to perform queries that join data from different tables together to get the results we want. Given the example above, we could easily join the Cars and Customers tables to show that Josh has purchased 2 cars, and Dave has purchased 1. This is one of the great strengths of traditional relational databases, in that they are designed specifically to allow for complex queries like:

Find all Customers in Australia who have purchased a Toyota

I might not have known that I would want to run this particular query when I created the database, but if the database is designed well I could run just about any query I wanted.

Relational databases are certainly not old useless tech, they are a powerful tool and often the perfect fit for a job. NoSQL databases are not the relational database killer, they are just a different tool that may suit different situations better than a relational database would.

NoSQL Databases

We've covered what a relational database is, so what's the difference between a relational database and a NoSQL database?

Unlike a relational database, a NoSQL database has no predefined schema and does not store data using related tables. NoSQL is not one specific thing, but in general a NoSQL database is not relational, and does not have a structured query language (although even this is not strictly true for all NoSQL databases). That's not to say you can't run queries against NoSQL databases, you can, just likely not in the way you would be used to if you have come from a SQL background.

Since NoSQL databases do not have a schema there's not really any work involved in setting up the database initially, you simply just add your data. This makes them flexbile, in that if you want to make a change to the way you store data down the track it is relatively easy to do.

This does not mean that a NoSQL is just a bucket you can dump your data into with no thought given to its structure. How you store your data will depend on the type of NoSQL database you are using, but a well thought out structure can greatly improve performance and the ease of which you can retrieve data and perform queries against your NoSQL database.

Later in this module we will consider the best data structure for us to use in the application we are building.

Types of NoSQL Databases

As I mentioned, a NoSQL database is pretty much anything that isn't a relational database... which means there are a lot of different options, including:

  • Key-Value (redis, Amazon DynamoDB, memcached)
  • Columnar / BigTable (HBase, Cassandra, Amazon SimpleDB)
  • Document (CouchDB, MongoDB, Riak)
  • Graph (Neo4J, AllegroGraph, InfoGrid)

Two commonly used NoSQL database structures are Key-Value and Document (CouchDB is a document based NoSQL database). Key-Value and Document based databases are both very similar in nature, but there are differences.

A Key-Value database is a simple way to store values indexed by a key. You would already be familiar with this concept if you have used the browsers local storage or Ionic's own Storage API. To use the browsers local storage as an example, you would set a value on a key like this:

localStorage.setItem('name', 'Josh');

and then you could later retrieve the value by using the key like this:

localStorage.getItem('name');

A document database is similar but stores "documents" instead. I think this terminology is quite confusing because if the term "document database" is new to you, you would probably be thinking in the context of documents like a word document or a PDF. This isn't the case though, a document in this context is simply a JSON object like this:

{
    "_id": 1,
    "name": 'Josh',
    "country": 'Australia',
    "interests": ['Ionic', 'Writing', 'Gardening']
}

The reason these are referred to as "documents" is because a document generally contains all the information you need in one place. If I look at an invoice, I could see all the data I need right there: biller, items, subtotal, tax, and so on. In a document based NoSQL database this might look like this:

{
    "_id": 392,
    "biller": 'Bob\'s Building Supplies',
    "items": ['bricks', 'paint', 'wrench'],
    "subtotal": 230.32,
    "tax": 0
}

The term document is just used to mean that we have multiple bits of information contained within the same structure that describe something. In a relational database, this information might more commonly be split up across multiple tables. To make things more confusing, we might also sometimes take a similar approach with a document-based NoSQL database and break this information up across multiple documents.

You could also store a similar JSON object to the one above as a value in a Key-value database, and then retrieve that entire JSON object later. You would just give it a key and then the value would be a string representation of the entire JSON object. So, what's the difference between the two approaches?

The difference is that with a document database the database knows about the structure of that JSON object, and provides a means to query against it, rather than just directly accessing a specific key value. If the above was just a chunk of data we needed to retrieve we could easily store it in a Key-value database, but if we wanted to run a query to return objects where the subtotal was greater than 50, then it would be better to use a document database that will provide us with some means to efficiently query the data.

If you need a way to save and retrieve simple data, then a Key-value store might serve you well. But if you want to store more complex data that you are going to query against (like the data we used before in the cars example), then a document database may be better suited.

...and then you have all the other types of NoSQL databases. If you're building a social network you might be better served by using a Graph based NoSQL database, which is well suited to large sets of data with complex relations between that data.

NoSQL databases are like a whole new universe, and it can be difficult to re-learn how to work with a new type of database when relational databases may already feel familiar.

So, why would we even want to work with them in the first place? Let's talk about some benefits...

Benefits of NoSQL Databases

I think NoSQL databases, document databases especially, fit nicely into the mobile web world since they often rely on JSON and Javascript syntax. Often working with a NoSQL database can just be "nicer" to develop with depending on the circumstance, but that is a very subjective assessment and there are more concrete performance considerations as well.

One of the main benefits that NoSQL databases have over traditional relational databases in terms of performance is that they scale out instead of scale up. In general, what this means is that as your NoSQL database grows you can add additional servers that contain partitions of your databases (even up to 1000s). Whereas with relational databases, often more power is added to a single server (this is probably a gross over simplification, but it gets the general point across), or a more complicated structure will need to be set up to spread the load over multiple databases. NoSQL databases are much easier and cheaper to scale in this manner.

On the contrary, this same thing is also one of the major downsides of NoSQL – it sacrifices consistency and stability for performance. NoSQL databases are able to achieve this performance increase by not adhering strictly to the ACID (Atomicity, Consistency, Isolation, Durability) principles that relational databases do. Relational databases are guaranteed to execute all of their reads and writes with no interference, if some data is available somewhere it will be available everywhere.

Instead, NoSQL databases usually take the BASE (Basically Available, Soft-state, Eventual Consistency) approach, which basically means that the database will be accurate eventually (as data is consolidated among potentially thousands of different nodes), but potentially inaccurate data could be read before that point in time.

NoSQL could be good for things like blogs and comments where consistency does not necessarily matter, but perhaps not for things like financial transactions where it is crucial that the data is always in a consistent state.

If you are building a social network application, would you care that data had not yet been replicated to a specific node so some user somewhere temporarily saw that a post had 23 likes instead of 24?

Or what about a YouTube video that showed it had 2,394,492 views when it really had 2,399,382 views? Probably not. The specific application is an important consideration when deciding on a database.

Summary

As you have probably been able to figure out from this lesson, NoSQL is a huge topic and a total paradigm shift from relational style databases that many people would have started with. If you're completely new to databases then, in a way, that's going to work in your favour since you won't have any habits from having a relational way of thinking ingrained into your mind.