Published at 25.08.2015
You probably know Relational Databases, you are also most likely very used to them by now. Don’t get to devoted though – life’s too short not to try some other things from time to time. There are plenty of Database systems out there, that are not Relational and along with the rise of Cloud Computing, they’re getting a lot of attention.
This is the first part of a series about different Database systems – how they work and what are their strengths and weaknesses. In this article you’ll learn a bit about the basics – what types of Databases you can make use of.
Table of Contents
There’s a lot of different Database types and each of them was designed to accomplish a specific task. Here’s a short overview about some major types:
Hierarchical Databases are a very old concept in IT – it comes from the Age of writing Code with a puncher. It is even older than Abba. The first Hierarchical Database developed by IBM in the sixties is still being used. You have one right in front of you. File systems are organized as a Hierarchical Database. And for our Windows users out there: The registry too! A simple model for a Hierarchical Database would look like the one below. Keep in mind that in this system, in order to access some particular node, you will need to know the exact path to it.
Relational Databases organize the data in one or more tables with rows and columns. A table represents an entity and is identified by a primary key. The rows are the datasets and the columns describe the attributes of each dataset. Tables can have relationships to other tables. To link one row to another, the row stores the unique id of the other row (foreign key). A single table can also represent a relationship between tables.
Graph Databases consist of nodes, properties and edges. Nodes represent entities like Author, Article, etc. Edges represent the relationship between nodes and nodes or nodes and properties. Properties are the information that belong to nodes. A simple example for a graph Database is a social network. There is a lot of people who know, like or hate in the network. Each person can be represented by a node and the edges between two persons represent their relationship.
Document-Oriented Databases store the structure of the data in a document. The Documents are usually Json-like objects. They can also embed other documents.
A simple example: A message in your inbox. You never search for messages that are not addressed to you in your own inbox, do you? So you can embed Messages in your User Document. When a Message comes in, the Server has to find your User Document and will simply add the Message to it. Once you open your Mailbox the application only has to look into your User Document and it has all the e-mails there, without having to read or join multiple Tables.
A JSON representation of a document could look like this in the database:
username: "John Doe"
subject: "Lunch at 1?"
message: "James would also come with us."
subject: "Kanban Meeting tomorrow at 10"
message. "Can we do that?"
The CAP-Theorem has 3 attributes: Availability, Consistency and Partition Tolerance. According to the CAP-Theorem it is impossible for a Database to reach all the three attributes at the same time. A Database system can only fulfill two of them.
They decide to guarantee Availability and Consistency, but they can’t work when one or more Nodes are unreachable.
Those Databases can live with disappearing nodes. But depending on the Partition grade, they may go read-only. MongoDB for example, can elect a new Primary in the replica set, but during the election time the nodes won’t answer requests.
The System still works when Partitioning occurs, but the Data can be inaccurate. DNS and caches are examples of such Systems.
OK, fair point. You are probably thinking along the lines of “RDBMS are good and solid and that hipster-database stuff is still too unreliable”. But wait! Non-Relational Databases like Mongo, CouchDB and Neo4j are used by a lot of large companies. Two prominent examples are Amazon and Google.
Sometimes you want a Database with little issues, when your application is distributed over multiple locations. Let us assume you have an application, that is used in the Office in Berlin and Sydney. With RDBMS you need either a Server in the “middle” or one in either Sydney or Berlin. What happens when the Server dies or is unreachable? What about the latency during writing? Australia has a bad connectivity problem – trust me – I know people working there.
With Mongo for example, you can build a Replica set with a Master in the middle and nodes in Berlin and London. Your clients submit write requests to the Master and read from the local Database server. If one of the servers dies, you can still read and write. Once the server is revived, it gets the changes and you have your System again. Okay, the write lag persists, but you can read faster (since you don’t have to use the internet).
With CouchDB you could even write to your “local” Database and it would be propagated through your network. You can even make your Application offline ready by using your employee’s Laptop as Database node and sync the changes when he is back online. Couch will still do the synchronization over the Internet, but your users wouldn’t notice it since it’s done in the background, while they are still working.
Each and every Solution has a drawback. But if you know the drawbacks, you can simply pick the Solution that has the least drawback for your scenario.
During the series we will show you different approaches to Databases and what their use-cases are, so you can stop limiting yourself to just joining Tables and you will enter the good side of Databases.