Published at 23.09.2016
In this blog post you can read about the journey of finding the right infrastructure and also what is planned for the next months. See how anynines will become a multi-region platform with the freedom for you to run applications in different regions with and without Amazon Web Services (AWS). Just as you need it.
As a platform and Cloud Foundry enthusiast you learn how infrastructure is challenging when building a Cloud Foundry based platform.
Table of Contents
anynines has been a wonderful journey since it started in 2013. As one of the first productive Cloud Foundry instances, we at anynines had the chance to collect valuable production experience with Cloud Foundry.
Just like any entrepreneurial journey there’ve been many ups and downs on our way. Beside of some hiccups while learning about Cloud Foundry itself the – by far – biggest challenge has been infrastructure.
Although one could think that infrastructure should be a solved topic by now, it clearly isn’t.
One of the core ambitions of anynines has always been to offer at least one region operated under European / German jurisdiction being fully compliant to European / German privacy laws.
With this goal in mind the Amazon Web Services infrastructure is not the first choice as all Amazon companies are held by a US parent holding. This sadly implies that any Amazon hosted customer is potentially vulnerable to inquiries from US intelligence services enforcing the US patriot act.
That’s why the first anynines region was operated on a non-AWS infrastructure.
Since Cloud Foundry was originally developed under the umbrella of VMware before later spun off into Pivotal, first Cloud Foundry versions have been developed against VMware vSphere. For this reason, the first anynines platform has been deployed on a VMware infrastructure as we thought the VMware CPI would be the most reliable one.
Note to remember: the BOSH CPI is the abstraction layer in BOSH allowing Cloud Foundry to be deployed to any infrastructure implementing a BOSH CPI (interface).
As anynines is backed by Avarteq we’ve also had access to Avarteq’s OpenStack team.
The thought came up to migrate to OpenStack.
This seemed to be a natural fit as OpenStack has been designed as an open source AWS clone including a bullet proof object store OpenStack Swift and solid multi-tenancy capabilities. Also was the VMware a hosted services and we assumed we’d have a faster feedback loop with a self-hosted rather than a rented infrastructure.
So as described in this blog post anynines migrated in early 2014 to a self-hosted OpenStack. Probably the worst decision in the lifetime of anynines as it laters turned out.
Since early 2014 around 70% of all incidents (see https://status.anynines.com) can be traced back to infrastructure or more specific OpenStack issues.
To be clear: nobody at anynines blames OpenStack but the combination of people, budget, mission and technology focus retrospectively identifies the OpenStack layer to be weak spot of anynines.
The problems with OpenStack peaked in Q4 2015 and Q2 2016 into two major OpenStack failures with outages being way out of scope for the service level anynines expects to provide to its customers. We had to learn that anynines is not an OpenStack company.
Therefore, we’ve decided to do fully focus on our platform and followed up with this action plan:
Option 1 was easy to decide.
Option 2 was a discussion but after looking at various European infrastructure providers, none could immediately suggest beyond doubt providing a stable infrastructure with full support by a BOSH CPI.
Knowing that Pivotal switched their public demo platform (PWS) to Amazon a while ago, it was clear that AWS’ infrastructure stability as well as the reliable CPI will be the most solid choice for an anynines instance.
Therefore, it is to mention that anynines has always intended to operate several Cloud Foundry instances in different regions to enable multi-region deployments.
Although the abovementioned issue with data-privacy (safe harbour, patriot act, etc.) exist, AWS is currently by far the most mature and biggest infrastructure on the market. So beside its stability also the Amazon ecosystem causes customers to ask for Amazon.
The goal of anynines therefore was to establish an anynines instance on Amazon for customers with demand for various Amazon Web Services such as their Elastic Map Reduce Service (EMR), Redshift, etc. The creation of anynines AWS Ireland region has been successfully completed in April 2016. Until the completion of further regions, anynines AWS Ireland will be the default anynines region.
With this migration we’ve been able to test the assumption that the anynines stack consisting of Cloud Foundry and the new anynines data services such as a9s PostgreSQL, a9s MongoDB, and a9s Redis is stable.
From five months of operations experience stability has been proven. None of the symptoms of the last 12 months have reoccurred.
With this experience the anynines team heads towards experiments to build another non-AWS anynines region.
The second anynines region then will be based a non-Amazon infrastructure operated under European and preferably German jurisdiction.
This way customers have a choice to whether make use of the vast AWS ecosystem vs. operating critical components beyond the official legal influence of the U.S. patriot act.
So what is the anynines path beyond Amazon Web Services then you might ask.
The German region is currently still under planning and development.
There are several decisions to be made and open questions around the Germany anynines region. The most critical decisions are the following:
The first decision is easy. anynines is a platform company. We won’t operate infrastructure layers ourselves, anymore.
With a significant amount of hardware in a Germany data center the first thought would be to install VMware vSphere on the prior OpenStack hardware, setup the anynines Cloud Foundry stack and off we go. However, this plan needs to be tested technically and comes with increased financial risks.
Operating own hardware requires an excellent data-center and an established trust relation to its team. With Skyway DC in the south-west of Germany there is such a data center. The challenge here is that while Skyway DC offers the flexibility and cost-structure anynines needs, the hardware needs to be leased on a 24 or 36 month basis.
As a platform provider anynines wants to serve customers with varying workloads and offer the possibility to scale on-demand at any time.
With own hardware anynines has to pay for the spare capacity of unused infrastructure which is particularly hard to bootstrap, financially. Losing one big customer can easily disturb the financial stability of such a composite business strategy. Since anynines already has hardware, the bootstrapping problem is already solved. Still the challenge remains for future infrastructure scale-outs.
A rented infrastructure, on the other hand, solves this financial risk with the disadvantage of higher buying costs.
So the challenging part here is to find an infrastructure provider with a matching cost structure to trust. anynines has learned the challenges of running OpenStack the hard way. These challenges need to be solved by any infrastructure provider and we’ve seen many infrastructure companies struggle financially and technologically.
Another factor on the choice of the infrastructure provider is the underlying infrastructure technology. While OpenStack is a great match from the architecture point, there is still a surprising low number of German OpenStack providers that have shown their success beyond doubt.
The infrastructure technology decision is also influenced by Cloud Foundry, a core technology of the anynines stack. The anynines application platform as well as the brand new anynines data services are deployed with and based on the BOSH automation technology.
Bosh enable the anynines stack to be entirely infrastructure agnostic. All that is required to run anynines is a BOSH CPI – an adapter to teach BOSH on how to create virtual machines and persistent disks on a new infrastructure.
Amazon, OpenStack and vSphere are proven CPI implementations and more than trustworthy.
Also the Azure CPI is interesting as there’s a cooperation between T-Systems and Microsoft to address the German-jurisdiction challenge.
Facing a decision with large impact while there is still uncertainty a culture of experimentation helps. Therefore, we started to look into the scenarios “Our hardware with a managed vSphere” and “rented infrastructure”. In follow-up posts we will present experiments and insights in more detail.