Principles and Strategies of Data Service Automation – Part 3

Full Lifecycle Management

Once automation has been set as a strategy, striving for full lifecycle automation is a logical conclusion. Providing fast and easy installation and initial configuration of data services is not enough. Depending on the application, the lifecycle of a data services is years long. During this time, agile developers will deploy many application versions, migrate the database schema, upgrade the data service version multiple times and server failures may occur.

Full Lifecycle Automation

Full lifecycle management comprises everything a database administrator (DBA) would possibly do. This includes the download, installation, configuration of the data service. More than that it also has to cover 2nd day operations including monitoring, version upgrades, scale-outs, creation backups and possible recovery from failure scenarios.

Starting with full lifecycle automation, it’s helpful to first look at what the lifecycle contains. Then, according to the Pareto principle a scope is set covering 80-90% of a data service’s best practice use cases.

The Lifecycle of a Data Service

The lifecycle of a data service such as a relational database (RDBMS), a document database, key-value store, message queue or search & analytics server may differ. Still, from an operational point of view there will be a similar set of operational task categories to taken care of:

    • List and present available data services
    • Let a developer select a data service and a service plan
    • Create a service instance for a given data service and service plan
    • Modifying a service instance
      • Restoring a backup
      • Scaling
        • Scale-up
        • Scale-down
      • Add a log and/or metric endpoint
      • Perform a version update of the service instance
        • Major
        • Minor
        • Patchlevel
      • Enabling / disabling data service plugins
      • Misc configuration changes, e.g. changing data service config file settings
    • Creating a backup
    • Recovering a failed instance
      • When a service instances has failed
        • An automated resolution should try to self-heal the service instance
          • Recognition of known error states with corresponding resolution handlers
            • Including general self-healing procedures such as process and VM resurrection
            • Data service specific error classification and resolution logic
        • Options supporting a manual intervention should be given
          • e.g. triggering a certain recovery procedure for a known error case that cannot be detected automatically or executed without input from a human operator
            • Fallback hard-reset to re-create a service instance from a backup as a last on-demand self-service resort for an application developer
    • Delete a service instance

Although it may seem contrary to the point made earlier, it is absolutely ok to start automating the low hanging fruits, first. Starting with the most important and simplest tasks is recommendable. The point is not to stop after the installation and configuration of data services. This is where data service automation solutions split apart. This is where architectural and technology choices turn out to be beneficial or a drawback. Therefore, it’s recommendable to walk through the lifecycle at its entirety before starting to automate. One of the following chapters –  “Operational model first, automation second” – will cover in greater detail how planning ahead can eliminate waste from implementation without being to “waterfall-y”.

Anti-Patterns

While data service automation has to start somewhere, there are data service implementations that do more harm than good. The line between a lean startup style minimum viable product and a failing product is the product’s ability to evolve. Even when strictly driven by customer demand it is wise to think ahead. When looking at the above mentioned lifecycle, a long automation roadmap is easily derived. This roadmap contains many challenges which requires some ahead thinking to avoid costly refactoring once the corresponding backlog items are prioritized.

A short path to data service automation has repeatedly been the automation of shared servers or clusters. A MySQL or RabbitMQ server, for example, which will be used by multiple tenants who will receive a database or virtual host instead of an on-demand provisioned dedicated server or cluster. These naive implementations provided a quick “solution” to new platform adopters. In some cases the point in time has been missed to replace the temporary getting-started-solution with a production grade implementation. As a result weakly isolated data service clusters degraded in performance or even collapsed entirely. The lesson to be learned was clearly: there is no alternative to on-demand provisioning of dedicated data service instances. This design pattern works across many data services – regardless of their multi-tenant capabilities – at scale.

There’s a restriction to the isolation-feature of on-demand provisioning of dedicated data service instances: It is only as good as the isolation-capabilities of the underlying virtualization layer. Especially the recent hype around containerization and resulting attention towards Kubernetes makes this point important. While there are indicators suggesting that Kubernetes will evolve towards a fully grown infrastructure replacement, it is not there, yet. This has strong implications for data service automation.
To be more precise, at the time of this writing Kubernetes does not provide a solid disk IO and network IO isolation between containers. This means that Kubernetes currently cannot prevent that two data services instances, being co-located on a Kubernetes node, will drag down each other’s data service performance. The conclusion from this fact is that the time is right to look at how data service with Kubernetes can be done but production workloads should be put on solid para- or fully virtualized infrastructures, provisioning well isolated VMs instead of containers. For now.

Another problematic approach to data service automation in the context of large application platforms such as Cloud Foundry is to use legacy DevOps methodology. As we will discuss in the chapter “Be agnostic” it is important that the data service automation follows the lead of the application platform and modern application platforms are agnostic. Look at Cloud Foundry, for example, both its Application and Container Runtime are automated using BOSH – an automation tool that ensures true infrastructure and operating system agnosticism.
Approaches to integrate existing legacy DevOps data service automations with modern platforms often lead to a compliance gap. It works to some degree but it cannot keep up with the platform. Often neither a full lifecycle automation is reached, too much engineering capacity is bound or shifts such as the change of the infrastructure cannot be followed within reasonable time. The recommendation therefore is to keep parity between the degree and quality of automation of both the application and data service automation.

Lifecycle Automation Challenges

The full automation of the entire lifecycle of any data services is a challenge. Depending on the data service to be automated, there are more or less issues to overcome. Some data services are meant to be run in a cloud environment while others seem to be carry the legacy of manual operations in their DNA. With the development of modern platforms, data services will change over time and become more automation friendly. Until then a list of challenges has to be overcome. Complex configuration files, many loosely integrated components to be orchestrated, weak robustness and weak self-protection, lacking authentication and encryption support or the inability to gracefully run in a shared network are just a few of them. It’s safe to conclude that the automation effort depends on the choice of data services to be automated.

In the following chapter “Pick your data services wisely” we will further investigate this matter.


Leave a Reply

Your email address will not be published. Required fields are marked *