Published at 18.03.2018
Table of Contents
Physical clusters provide options to make applications highly available. With the rise of virtualization technologies such as VMware, XEN and KVM it became easy to slice a physical host into smaller virtual chunks known as virtual machines (VMs).
This leads to multiple advantages.
With virtual machines multiple components can be safely packed onto a physical machine. Each VM has its own cpu shares, memory, filesystem and thus may install custom software packages. VMs are highly isolated and therefore do not interfere with another. This leads to a true multi-tenancy capability which most free server operating systems – especially Linux – lacked in the years around 2005 to 2010.
Packing servers densely with VMs also leads to a much better average utilization of physical servers and thus the entire data center.
As mentioned before this not only saves money for servers but also power for server and A/C operations. Not to mention step-fixed costs of building a new data center.
While the improved utilization is neat, the more important benefit of virtualization is the ability to automate the lifecycle of servers.
The virtualization of computational resources as VMs, software defined networks and software defined storage solutions provide all means to programmatically create virtual clusters.
Over the years, virtualization technologies have evolved way beyond the boundaries of a single physical server. The virtualization of computational resources as VMs, software defined networks and software defined storage solutions provide all means to programmatically create virtual clusters. This is how virtualization laid the foundation for modern virtualization infrastructures.
Neither automated software installation nor software configuration management have been entirely new when the term DevOps was coined in 2009. It was a mixture of spirit and automation that drove this movement.
Bringing values from the agile manifest to software operations consequently applying automation has generated a carrier wave for many excellent tools to emerge from this hype. Chef, just to name one.
Before Chef clusters often have been held together by many scripts and, most importantly, people. These people knew their clusters. In smaller organizations this knowledge created bottlenecks and social SPOFs. In larger organizations runbooks and other policies consumed many hours to be written and maintained.
Cookbooks helped to realize the DRY principle – don’t repeat your self – by writing automation once and applying it many times to many servers.
With technologies such as Chef – you may replace the with similar technologies – automation has been centralized. Cookbooks helped to realize the DRY principle – don’t repeat your self – by writing automation once and applying it many times to many servers. The existence of a centralized database of knowledge about servers and deployments helped to organize the construction of clusters and operation of complex distributed systems. Application servers can programmatically find corresponding databases applying the same automation to various application systems.
A community of DevOps has formed sharing automations as cookbooks across organizational boundaries. The tech community greatly benefited from this development.
However, these approaches had their limits. For Chef to run, a server still need to be bootstrapped. A virtual machine needed to be created, the Chef-client installed, the initial configuration pulled and executed.
The focus of Chef is software installation and configuration. It does that by repeatedly pulling instructions from a central server and applying it to the local machine.
Many cookbooks referred to OS specific package managers. Resulting automations can support multiple operating systems with conditionals in Cookbooks.
However, they’ll never be truly operating system independent. This may sound acceptable but it is not that easy. Depending on an operating system means to make assumptions about it. Assumptions like directory structures which will break automation if anything changes in the OS.
Consider a large scenario of hundreds of VMs. When the assumed state of the automation diverges from the actual states of the VMS, a disaster is likely to happen. Generally, a major risk is a divergency between the Chef automation’s assumed and actual state of a machine.
Chef automation works with both physical and virtual machines. This maybe good in some scenarios but it also restricts the scope of Chef as shown by subsequent technologies such as BOSH.
Chef is like automating logging into a server and automatically issuing commands. Without plugins, it does not make use of the power of the programmable datacenter. No start of VMs, no configuration of software defined networks. It does not now about the separation of VMs and their persistent disks. It does not claim any responsibility to unify software installation or monitor resulting processes.
While it is possible to let Chef install applications, building a continuous deployment pipeline is not in the Chef’s DNA.
The Ruby on Rails community, for example, used Capistrano to perform continuous deployments.
Capistrano had the approach to literally log into a list of servers via SSH remotely and perform shell commands. Using the Git command line client to checkout and update source code, Capistrano helped to maintain a growing number of application servers.
The Capistrano approach has a similar flaw: whenever the assumed state diverges from the actual state, the automation fails.
With most of these 1st generation DevOps technologies the major challenge of larger deployments was keep the server states in sync with the automation. Any external influence may disturb the automation. A failing automation most likely implies the necessity of a manual intervention.
The emergence of programmable data centers as infrastructure as a service has lead to the development of a 2nd generation of DevOps tools.
Dealing with physical hardware requires interaction with the physical world in the data center. Someone needs to go to the server, pull or plug-in a cable. This obviously requires people. People get busy and it takes them minutes or even hours to complete a task.
The creation of individual virtual machines is handy but it is the creation of clusters of virtual machines that is a real game changer. Getting rid of these physical overheads makes cluster building significantly faster. Assembling smaller VMs into smaller virtual clusters also reduces minimal costs for clusters.
In order to build clusters of virtual machines the creation of VMs is not enough. A software defined networking (SDN) layer is required. An SDN helps to overcome the boundaries of a single physical host. Only when virtual machines can be created across a cluster of physical hosts, virtual clusters can be build at scale.
With a SDN in place, operating a larger number of physical hosts leads to another challenge. The failure of individual physical servers now have greater impact as they potentially affect multiple virtual machines. Individual metal servers may fail for various reasons. For one they are still hardware and hardware fails. Also they now contain a virtualization and networking software layer. They fail, too. And if they don’t fail, they need maintenance once in a while.
So a major challenge is how to take down individual infrastructure hosts of the virtualization layer without affecting the software layer too much?
The importance of single infrastructure hosts has been significantly reduced by applying an important architectural pattern that is a true milestone in the development of modern infrastructures:
The separation of ephemeral virtual machines from their persistent data residing on a remotely attached disk.
Nowadays, this may feel trivial but 10 years ago it wasn’t.
Let’s look at a database server, for example. A database process stores data in the local filesystem and it does it fast. For this reason, the idea of putting a database process inside a VM – losing CPU and disk IO performance — and even more radical — outsourcing the data to a remotely attached block device – seemed crazy.
However, network bandwidth rapidly grew from 100 MBit/s over 1 GBit/s to 10 GBit/s and more.
Just to put that into perspective 10 Gbit/s are about 1.25 GB/s. That’s more than a typical hard drive delivers in real world scenarios. And even if the database is somewhat slower, all a virtualized database server has to be, is fast enough. In that case you still hugely benefit from all virtualization advantages.
So let’s have a look at these benefits of separating the ephemeral VM from it’s persistent data.
Virtual machines fail, why bother? Because they maintain state. State causes most of the trouble in software operations.
When a VM with local storage dies due to total destruction of the underlying physical server, its state is lost. State doesn’t only mean the data of a database but also the configuration of it. A recovery of an entire server takes time. Recovery of large amounts of data even more.
When a VM with a remotely attached persistent disks dies, all that needs to be recovered it the configuration of the VM. Data can be re-attached with its persistent disk. Recovering a VM configuration can be done with a corresponding VM disk image and/or a configuration management automation. Recovering such a failed VM can happen within minutes rather than hours.
Remotely attached disks appear as block devices to our operating system. In other words, they pretend to be hard disks while they are actually a proxy for an actual hard disk of a remote storage server. These storage servers are usually high-performance and highly available clusters.
In a proper virtualized infrastructure there are at least three availability zones (AZs). Each AZ has its own storage, networking, power and air conditioning. Typically, they also represent separate fire areas to project against fires spreading through the entire data center.
With such a virtual infrastructure in place, a virtual data center is at hand and creation of virtual clusters is truly possible.
Interestingly, many products initially provided fancy UIs to do click-configure virtual data centers. Surely, this is a great idea helping many technicians to do their job faster.
However, the true revolution was to expose the flexibility of the virtual data center as a web service with a well documented API. Public Infrastructure as a Service (IaaS) providers appeared and gathered huge junks of the hosting market consuming market shares of classic server and hosting companies.
While the most successful Infrastructure as a Service providers use proprietary software, open source projects such as Open Stack have emerged trying to commoditize infrastructure automation.
The broad availability of Infrastructure as a Service, data centers at the hand of an API, ignited a new dynamic among Devops methodologies and technologies.
Check out the full series here: Evolution of Software Development and Operations