AWS Startups Blog

How DataRobot Leverages AWS for Predictive Modeling

Guest post by Huy Le, Operations Director, DataRobot


DataRobot

DataRobot is a Boston-based technology startup, founded with a mission to help data scientists of all experience levels build and deploy better predictive models, faster. Our application is compute-intensive because building predictive models involves a lot of number crunching. At any given moment, we may be launching hundreds of servers to sift through users’ data, perform massively parallel computations, and build tens of thousands of predictive models. This process results in models that are highly accurate, delivered in a fraction of the time it would other wise take our users.

Selecting a Cloud Service Vendor at DataRobot

Choosing the right servers for our modeling tasks wasn’t a trivial consideration. The type and size of servers that we launch vary depending on the size of the datasets and the complexity of models. Some of the modeling tasks require a lot of CPU, while others require a lot of memory. Our server selection matrix will become even more complex once we complete our current beta program and release our product publicly. We’ll soon be launching thousands, not hundreds, of servers. This means being ever more mindful of the infrastructure costs. On top of that, we deal with all kinds of data from customers in industries as diverse as insurance, healthcare, pharmaceutical, banking and retail. We need a reliable cloud service provider to run our application so that our customers can be confident that the data they upload to our platform will be safe and secure.

As we searched for a cloud service provider that could meet our requirements, we realized that Amazon Web Services (AWS) was a natural choice for us. AWS has a comprehensive list of product offerings, feature options, security and regulatory compliance, and highly competitive pricing models. One of the AWS offerings that caught our interest was Spot Instances. Using Spot Instances is like bidding for a product on eBay. You can bid on unused Amazon EC2 instances, which can lower your Amazon EC2 costs significantly. You choose the server type you want to use and set the highest price you’re willing to pay. If there are servers with prices at or below your set price, you use the server and pay the current market price. Most of the time, the market price is just a fraction of the regular on-demand price. With Spot Instances, our application can do massive number crunching at a fraction of the usual cost.

The wide range of AWS server types and sizes was attractive, too. We use servers as small as t1.micro, which cost about $15 per month to run, and servers as large as r3.8x, which cost over $2,000 per month to run. AWS has a server size for every kind of need.

Our Experience Working with AWS

The AWS SPOT team and our account manager have been instrumental in helping us make the best use of AWS Spot Instances. The SPOT team visited our office to learn about our application, and advised us on how to get the most out of Spot Instances and keep costs low. Our account manager also analyzed our overall usage and suggested cost-saving strategies. Overall, we have really benefited from the advice and support that AWS has provided.

Another aspect that we appreciate about AWS is its ability to provide additional capacity as soon as we request it. At times, we’ve needed our capacity limit increased by thousands of servers. AWS has been able to do that on the same or next business day every time. I’ve made similar requests with other service providers. It typically took those providers more than a week to fulfill the request, and still the request would be only partially fulfilled.

Advice for Selecting a Cloud Service Provider

When you start building your new company, perhaps you will be working on it while living off your savings or on your spouse’s income, so you may not have a lot of cash to spend. You want a service provider that offers a wide range of server selections, so you can start with a small server that you can pay for with the savings gleaned from the few cups of coffee you give up each month. When you are ready to invite some users to try out your product, you can upgrade your server to a larger size to support a few concurrent users.

As you acquire more users and get ready for a product launch, you will want to test your product on other, larger servers and assess the capacity each server can handle. From there, you need to develop a plan for how to scale your platform. If you are building a consumer or social application, you can expect to have bursts of traffic. Be sure you can rapidly scale out your platform based on load demand and load volatility. Ideally, your platform should be able to automatically scale as load demand fluctuates. Auto scaling support should be part of your cloud service provider selection strategy.

If you are building a product that needs to handle payments, be sure to choose a service provider that is PCI-compliant and has hardware encryption support. And make sure that you meet the additional compliance requirements for your specific offerings. For example, if you are building a product that will handle healthcare data, you will need a server provider that is HIPAA-compliant.

You should select the right service provider from the beginning. Don’t put this off until you are ready to have customers. Every service platform is different, so it’s important to work with the same platform from the beginning so you don’t waste your time learning new platforms as your needs change and your business grows.

When you select a cloud service provider, security should be the top criterion. You need to choose a service provider that has granular access control to your cloud account. In addition, you need to be able to grant different levels of access to employees on an as-needed basis.

What’s Next in DataRobot’s World?

We’ve had great success with our beta customers. We’re working relentlessly to polish the product, and we’re excited about our upcoming release this year. If you are working on predictive analytics and you want to “build better models faster,” you can sign up for beta access at http://datarobot.com. (We’ll provide VIP access to early sign ups.)