AWS Official Blog

AWS Summer Startups: Peritor/Scalarium

by Jeff Barr | on | in Amazon EC2, Architecture, AWS Startup Challenge, Case Studies, Europe, Germany, Success Stories, Summer Startups | | Comments

Over the summer months, we’d like to share a few stories from startups around the world: what are they working on and how they are using the cloud to get things done. Today Im speaking to Jonathan and Thomas, two of the creators of Scalarium, from Berlin, Germany! 

Peritor team

R: Hi guys, could you briefly describe Scalarium and the background of your team?

With Scalarium, we’ve created an easy management service for EC2 clusters. Scalarium helps our customers deploy Rails, Node.js, PHP, Java, Python or any other stack. It automates the initial setup and continuous configuration of servers. Scalarium also takes care of scaling, security, monitoring, and a lot more.
We started as an IT consultancy in 2005 and used EC2 from the early days on to help our clients scale out. Doing so, we realized that we repeated ourselves in this kind of projects. So we created Scalarium as a framework that helps customers automate EC2 deployments.


R: How have you incorporated Amazon Web Services as part of your own architecture? What services are you using and how?
We heavily use EC2, EBS and S3. And in our stack you will find Ruby, CouchDB, Redis, RabbitMQ, Chef and other nice and shiny stuff. We brought you a little illustration that shows you how we run Scalarium on EC2. But before that, you will need to understand a little more about what we do.
As said, Scalarium helps customers run apps on EC2. But instead of offering you some restrictive and expensive PaaS solution, we offer you an elegant way to automate everything on your servers. So you will still maintain root access to all servers and are able to configure each and every setting.


R: How does Scalarium help customers run apps on AWS?
In the cloud, each server goes through something that we would describe as a server life cycle. Each and every server in your cluster comes to existence at some time, it experiences some changes and it goes at some point later. Some of them have a rather short lifespan like application servers that are used to burst out, others have long lifespans like database servers. But all of them go trough this cycle.
We defined events in this life cycle which we and you can hook into to execute scripts on the servers. The life cycle events that are used in Scalarium are the following ones. 
  • Setup is used to update a base image and install everything you need on the fly as soon as the server comes into existence.
  • Configure is triggered by any change in the cluster – new servers coming or old ones going.
  • Deploy executes scripts that should run during the deployment of an application on the servers. You can hook into the deployment with before_migrate or any other hook you know from Capistrano.
  • Undeploy – this is triggered if you want to remove an application.
  • Shutdown is triggered if you gracefully stop a server. You can copy stuff around or inform other servers about the absence of the server in advance.
Now imagine a very basic setup with one load balancer, a couple of app servers and a single database.What would you need to do if you wanted to add another app server to your stack? (click image below to enlarge)
You would need to boot an AMI (Amazon Machine Image), log in to the machine, install updates and dependencies, configure all services, cron jobs and so on and last but not least deploy your application. But you are not done yet. You also need to log in to the database server and grant access to the new app server by adding the IP to your ACL. After that you have to log in to your load balancer and add the app server to the load cycle.
This procedure is rather tedious even for easy and basic setups like this, but as you can imagine, the number of dependencies and tasks grows very fast as soon as you have more tiers and servers in your cluster.
What would you do if one of your servers dies or isnt reachable due to some temporary network issues? Have a look at the Netflix Tech Blog and learn about the chaos monkey and his friends if you think your servers will be always on and flawless forever.
We created Scalarium to take care of this type of concerns automatically. You can extend the abilities of Scalarium as you like because you can react to all life cycle events and hook into them. This enables you to do just about everything. You always start with a vanilla OS and in the end you have a totally customized setup on your server and all other servers in the cluster know how to react and reconfigure themselves. We offer a broad selection of predefined stacks and examples. You can change them easily or add your own ones.


R: How does the bootstrapping of an instance work?
In this picture you see roughly what happens behind the curtains if a new server is added to a cluster (click image to enlarge):
As soon as a new server is requested, we ask Amazon for it. Once the server finished booting it downloads the Scalarium agent and a custom certificate, installs the agent and connects back to Scalarium in an encrypted and signed way. We check what kind of server you instructed it to be and execute the appropriate Chef recipes. Chef is an open-source system integration framework, similar to Puppet or CFEngine. Check out our example cookbooks on github to get a feeling about how easy it is to use Chef. You will find the main Scalarium cookbooks there too.
The server bootstraps and will be your new app server, database or whatever you wanted it to be. This process usually takes just one or two minutes depending on the stack you want to install and the size of the server.
After successful bootstrapping of a new server, all existing servers in the cluster get informed. This step is very important. Because now, recipes bound to the configure event are executed on each server in the cluster. That way, load balancers can execute recipes that ensure that they are aware of all running app servers and that they can safely remove stopped app servers from their load cycle. A database server can check if it has granted access to the available app servers. But of course you also could do advanced things like adding new database servers and re-balance your data, update your nagios alerting or your graylog2 server to catch all the logs you want.
If you are done with your basic setup you can easily add time or load based servers, add and deploy applications to your cluster or clone the complete environment to create a staging system. All that can be done via the UI or the Scalarium API.

R: How do you run on AWS yourself?

Below is a simplified visualization of our own architecture. We use two main databases for Scalarium. One is CouchDB, used to store information like the cluster configurations, server descriptions and current state, applications, deployment definitions. The other one is Redis, used for accounting, events, monitoring and metering data.

We chose CouchDB for high availability, easy replication, clustering, robustness, and a short recovery time. Redis is awesome for the very dynamic, fast growing, and non critical data we have.

Scalarium itself a Rails app, the Scalarium API is a Sinatra app. Workers are based on RabbitMQ/Nanite.

Our setup spans multiple regions and availability zones to guarantee a high uptime. CouchDBs awesome replication features are used to have a master/master replication across regions. Redis uses a master/slave setup for data replication. (click image below to enlarge)

R: Why did you decide to use AWS?

Thats simple. We use AWS because its the only big, global distributed and reliable source for IaaS out there. Amazon kicks some serious ass and develops tons of new features and services. Last but not least we eat our own dog food – Scalarium runs on Amazon and is managed with Scalarium.

By using AWS and Scalarium we can grow in no time to handle as many customers we like, spin up staging environments, deploy fast and often and do all that completely automated. All fail over, scaling, backup tasks, monitoring and so on is automated. You will love doing that. You can concentrate on developing your app without hassling with data centers and servers.

Amazon enables us to have clients ranging from start ups with one server, over SaaS offerings and agencies with a couple of servers, to the worlds biggest social game providers like wooga or Plinga with an incredible number of servers running their games all over the globe.

If you like you can see a rather old video in which Jonathan explains the complete process to create a Rails cluster, add a Rails app and deploy it. Or even better, sign up and try Scalarium for yourself.

R: Any last words?

 Yes. Take part in the Global AWS Start-Up Challenge! It is a short application form. You can win cash, AWS credits and get a lot of visibility. And if thats not enough we give every semi finalist half a year free Scalarium on top.

So apply for the Start-Up Challenge now!