You probably wondered at some point, what happens when my app is a big success and gets big, how can the database keep up with the increasing load or why the heck do I include Redis on a series about databases. So let me show you the warp drive for your data access. But before we hit warp 9, let me show you the basics about Redis.
Table of Contents
All your RAM belongs to us
Redis is more like a datastore than a database, and persistent data is an optional thing. All the data are held in the memory at all times. This is the reason why Redis is so fast. It reads and writes data very optimized and only accesses memory at all times. It has a lot of commands that allow you to get predefined operations done on database level. It also allows you to define LUA scripts to add more functionality. It runs on a single core only, but CPU is rarely a bottleneck. The datastore can be persisted in three ways.
- Use another database as a backend
- RDB (creates a copy of the current data set when triggered with the help of a child process)
- AOF (creates a log of all writes)
Way 1 is a pretty common thing. Redis is a very good key-value store that is fast and is suitable as a cache. So why not use it to efficiently cache the data from your slower database? Sometimes it is also used for writes to heavily changed objects that are persisted from time to time to keep the database server from load related issues.
Way 2 and 3 may help you when you don’t want to grab everything again after failure. RDB restores work faster, since only one compact file that needs to be deserialized is created, but the problem is, that you will loose everything that happened since last snapshot. Since RDB runs in an extra thread and the main thread does no IO, there is no performance loss as long as your machine has enough cores. AOF creates a log that can be written with every commit or every second. It increases the overhead on your Redis instance, but with writing once per second, Redis is still very fast. To get a good persistence for your database, you should use both. You might loose half a second of writes, but depending on your use-case, that might not even be a problem.
What can I throw into it?
Redis has one big advantage – the ability to store not only strings. Additionally to strings it also allows lists of strings, sorted and unsorted sets of strings, hash tables and HyperLogLogs. Redis also enables a lot of high level commands like unions, intersects or matching on database side.
What can I do with such speed?
Well, a simple example from Redis is autocomplete. You can simply save all the words of the English dictionary in Redis. This will be about 300.000 entries. As soon as the user starts typing, you can do the matching over the whole dataset without any notable lag. The forum service Muut uses Redis to manage forums of nearly 80.000 communities. Their usual response time from Redis is < 0.01 seconds. A big RDBMS can probably do that too, but the creators of Muut wanted to keep the free service running and keep costs low, so they implemented their database in Redis.
Redis allows you to create chained commands,like “grab all that stuff, then do something with it” without wasting time on a second round-trip to your Redis server over the network, or 1000, or 10000. The developers suggest that you don’t chain too many requests each time to save memory on the Redis instance, because they have to be queued. Redis even allows the transactions. Another case where daisy chaining helps, is when you have a large collection of data. Your request probably only needs to operate on a small subset of that collection. So you can create Tag collections that keep the speed of Redis high.
Pimp my server
If you always send the same command chains that require you to do some work in between, you can use Lua scripts to add that functionality to your server. If the built-in commands don’t cut it, write ones that are suited for your needs. They are not as fast as built-in commands, but can do what you would do with your command batches otherwise. You also get the advantage, that such functions can read and write at minimal latency. Also, if you have replication enabled, only the script gets replicated as a whole, not each command used by the script.
Redis also allows you to use the Publisher/Subscriber pattern to get the updates to your client. This way you can get updates to your clients, without them having do polling for updates. If two applications access the same Redis, they should have a differing key name scheme and you should document your key naming convention. Otherwise you run into the risk of having a big pile of junk lying around. The same applies if you have multiple channels users can subscribe to.
So, are there any bad things about this?
For the starters, it is not made to be the durable thing your normal database is. If you don’t care about the data persistence, Redis won’t do it either. If you add persistence to it, it will get slower while not reaching fail safety that is comparable to Postgres or MongoDB. Since the whole dataset has to be kept in-memory, the dataset cannot be larger than the amount of RAM available.
Redis also uses a very basic authentication system. Once authenticated you can read and write to the whole dataset. There is no way of restricting access to subsets of the dataset. Therefore you should make sure, that users cannot send commands like FLUSHDB which will result in a very empty Redis. You should also keep an eye out for your key naming convention.