Published at 08.06.2018
The necessity to create backups needs no justification. Everybody knows that backups are mandatory.
An exception to this rule may be systems that are inherently designed for high-availability and use a high-number of data replicas. Especially, when dealing with vast volumes of data, this might be a valid strategy.
However, the average data service requires backups. As the fallback strategy „re-building over fixing“ shows, having a backup is not enough as the backup also must be consistent and recent to minimize damage during a disaster recovery.
So when automating multiple data services shall be used at scale, a unified handling of backups contributes to the principle „Solve issues on the framework level, fine tune data-service specifically.“
This unification is the achieved by creating a backup framework that covers repeating tasks of backup and restore procedures. There are many different ways to create backups with input factors such as the data service, infrastructure and required backup frequency. This is why a backup framework rather than a monolithic backup solution is a data service automation engineer’s best friend.
Obviously developers want to create backups and restore them on-demand. In case they want to schedule backups, the actual backup will be triggered by a CRON-like daemon rather than a human user. The same happens when a failed data service instance is to be recovered: a workflow will be executed that will trigger the backup/restore functionality, automatically. Therefore, the backup framework needs to provide an API.
The backup API has the purpose of harmonizing the backup/restore functionality behind a common interface such as a REST API.
The implementation of such a Backup API needs to encounter challenges rising from various different backup/restore approaches from a growing list of data services to be integrated. Finding proper abstractions and reusable entities is a key-challenge.
As mentioned before, the backup framework must allow the implementation of various different backup/restore strategies across infrastructures, data services with varying requirements.
The optimal backup strategy and backup implementation relies on the particular context. A platform with a fixed infrastructure, for example, may lead to a very different backup strategy than a generic data service solution designed to run across a variety of infrastructures and platforms.
For a fixed-infrastructure scenario, it may be a data service independent approach to create copy-on-write snapshots of data service volumes. This covers a set of data services with very little effort. However, if – at any point for any reason – another infrastructure needs to be explored, a new strategy must be applied and implemented.
Such an infrastructure independent strategy needs to sacrifice the data service independence as a trade-off. By creating data service specific backup and restore adapters, no significant assumptions about the infrastructure are necessary.
To enable the backup framework to cope with all these scenarios the concept of backup workflows has proven its value. A backup workflow is comprised of a backup plugin, a filter chain as well as a storage output plugin.
The backup plugin implements the task of storing the actual data. An infrastructure specific plugin may trigger a volume snapshot where a data service specific plugin may trigger a specific backup tool. Now, there is data extracted from the data service but what to do with it? Most commonly, at least two subsequent actions are required before data can be transferred: compression and encryption.
Both actions can be implemented as filter plugins which can be chained to maintain a certain sequence. Additional filters for other purposes, e.g. data integrity checks, are imaginable. The last step in a backup workflow is transferring the backup to a secure remote location. This is the purpose of a storage output plugin. The storage output plugin is likely to be infrastructure specific and can integrate anything from Amazon S3, OpenStack Swift or any other storage backend.
The backup workflow covers the elementary task of creating a backup while being as agnostic as possible.
Based on such a backup framework, it is an optimization challenge to maximize the backup frequency while minimizing impact on the data service and maintaining data consistency. A major step towards this is to use streaming backups rather than dumps.
With ordinary dumps a developer invokes a command and data is written to a dump file. This file is then processed. Looking at the before mentioned backup workflow, the worst case would be to create a backup file, then compress it, then encrypt it and finally upload it; every step being executed in strict sequence. This comes with two major disadvantages:
1. It takes very long as the runtimes of workflow steps accumulate.
2. It requires a large amount of temporary disk space. Both makes the handling of large data sets impractical.
The more efficient approach is to create a backup streaming pipeline and concurrently read a stream of data from the backup plugin feeding it into subsequent filter plugins such as the compression, encryption as well as the storage output plugin. This way data is being read, compressed, encrypted and transferred, continuously. The runtime of the backup workflow is close to the runtime of the slowest plugin and ideally only a buffer-size chunk of the data needs to be temporarily stored on the data service instance.
An efficient backup framework is the foundation for many features when striving towards the full lifecycle automation of data services. This includes features such as: disaster recovery of failed service instances and cloning and forking of data service instances. The better backup workflows are implemented, the more they will represent a platform’s guarding angel.
Check out the full series here: Principles and Strategies of Data Service Automation