Cloud infrastructures – Building clusters

Mammatus-clouds

Many of you may wonder why do I put this blogpost online. The answer is quite simple, I want to share our knowledge about the basics of technology, so you can  understand better how most of the apps work. Software gets more and more complex and you should try to keep things as easy as possible.

The Classic, 1 Application, 1 Server

Such relation is probably one of the first thoughts of novice level person. You pay for a server, your app works, and works, and works and so on. But wait. What happens, when your server crashes? Murphy’s law clearly states that this always happens on Saturday night, when you are on the sofa with your partner or while you are sleeping.

Okay, let’s then examine the reason of the crash. Usually, Web servers are built as follows:

  • A Linux distribution
  • Language Specific Web server
  • MySQL or PostgreSQL
  • Your app deployment
  • (Stored user data like profile pictures)

Icons made by Freepik from www.flaticon.com is licensed by CC BY 3.0

Long story short – if one of those programs fails, everything else will also not work. The server and every process on it is a  single point of failure. It does not even need to be software related. Even a simple thing like too many users can kill your application.

The question is, how do you remove the single point of failure and scale your application?

One app, two servers

appdb

You might as well have the idea to keep the database on a second server. You double the server cost, but still do not gain anything, except that Database is on a different file system and does not have to share read and write capacities.
Important to notice, most of the time it is idling around, so in the end, you paid to bore your servers.

2-4 Servers, app and db separated with load balancers

loadBalancer

This may work as an improvement to the previous installations, but still has some issues. First of all, you want multiple load balancers to prevent an unresponsive load balancer from being a single point of failure. Also, your NFS can still be single point of failure. But looking on the bright side, a crashed database or application server won’t ruin your day.

On the other hand, you now pay for 6 servers. And you have more work to set it up. Your servers need to check if others are alive. Your load balancer needs to know if the application server lives. Your application and database servers need to know when a database server crashes. If you also include a NFS failover, then you can reach fully redundant service.

You now have to administer a lot of servers and most of them probably idle around all day while you pay it. Adding more application servers is now trivial.

But the problems remain:

  • RDBMS and NFS usually only scale vertically
  • Robustness and scalability depend on the used (persistence) services
  • NFS servers are neither easy to scale or inherently HA

Scaling

scaling

Let me say a few words about scaling, to clear up any confusion that could come up.

Vertical Scaling:

  • Better hardware
  • Better CPU
  • More Ram

Horizontal Scaling:

  • More hardware
  • More Servers

Both Scalings have their own issues where they can reach the scaling limit. For example, there is no CPU better than the currently most powerful cpu and their cost usually don’t scale linearly. On the other hand, Horizontal scaling runs into issues like the fallacies of distributed computing. Furthermore, not all of the applications can be easily scaled horizontally.

If you expect your app to work under high load, you should make it easy to scale it the horizontal way.

Deployment

At some point you usually want to deploy your application. The more complex your system is, the more complex your deployment will be. Due to this fact, you should use deployment automatization. Examples for such systems are:

Automatisation of Installations

When you want a self-healing and scalable application, then you need automatization for installations. Popular technologies for that are Puppet, Chef and Bosh. All three of these tools have such a large feature set, that we we will not cover it here, but you should take a look at it.

Pacemaker

Pacemaker is a cluster resource manager, which is able to manage services in a cluster. You can specify which services may or may not run on the same hardware node and when one of your services or machines crashes, Pacemaker can revive the services again. It supports many cluster types and offers configurable quorum strategies.

Storage

Storage is also something you have to keep in mind in a cluster, where hardware and software failures are anticipated. NFS has an inherent problem with horizontal scaling, meaning, there is no built-in way of scaling it.

Storage Types

There are three ways to store your data in a cluster:

  • Block storage
  • Object Store
  • Remote Filesystem

Popular frameworks for storage are:

  • GlusterFS connects storage in a cluster, shows it to outside as a single NFS. You can access it as Block device or remote filesystem.
  • Ceph can work as object store, block device or remote file system. It also offers a http api.
  • OpenStack Swift is an object store with http api. It can make temporary urls for assets.
  • Riak is technically a key, but the http api makes it usable as an object store.

Application Server

The application server is the web server, that builds the bridge between your programming language or web framework to the user. This can be an Apache httpd, Tomcat or your language specific web server.

Load Balancer

Load balancers distribute incoming requests to 1-n app instances. But before you say that this is a call from Captain Obvious, let me share a few details.
Load balancers do more than just round-robin all day. Load Balancers need to know which application instances are active and which are not. It would not be good if every nth request is not reaching a active instance, because the load balancer does not know that the instance is dead.

Your load balancer should also be redundant with tools like Pacemaker to prevent situations where your application servers are all healthy and only the load balancer has a broken network interface or kernel panic. Popular Load balancers are Apache httpd with mod_proxy_balancer, Nginx and keepalived.

Message Queues

Applications usually have something, that they should not do by themselves – Background tasks. Would you like to wait on a page to refresh, for example on Youtube after uploading your video, because all the scaling and thumbnail generations are done in your request?

This is where message queues come in. The same situation is with Mail sending. You just trigger the process of sending a mail, which can be done asynchronously.

Your dying mail server should not interfere with your applications’ responsiveness. A popular example for a message queue is RabbitMQ. It offers persistent Queues with send and receive notifications, can be clustered and has flexible routing. You can find some examples for queues here.

Leave a Reply

Your email address will not be published. Required fields are marked *