Stream Powers Feeds and Chat for over 500 Million End Users with AWS
Guest post by Thierry Schellenbach, the CEO and Co-founder of Stream
Since the founding of Stream in 2014, Stream has been a loyal and enthusiastic customer of AWS. Thierry Schellenbach, the Co-Founder and CEO of Stream, announced the new venture at the AWS Summit in Amsterdam along with Co-Founder and CTO Tommaso Barbugli.
Stream has come a long way since they first started working with AWS, and now powers feeds and chat for more than 500 million end-users. In this blog post, we’ll cover some of the best practices and AWS services that allowed them to sustain this rapid growth.
Stream is a firm believer in building only the infrastructure that you need to run your core product and outsourcing the rest to third-party providers who specialize in the area in which you need additional support.
An excellent example of this is the Stream API log. For high availability and resiliency, the company stores all API calls and can replay them if something goes wrong. The API request log is built entirely on top of AWS architecture using Amazon Kinesis (the Kinesis Firehose), and AWS Lambda, allowing for high availability and uptime, as well as flexibility, and cost-efficiency.
The same goes for architecture such as load balancers, the PostgreSQL database for configurations, and asset storage. All the basics are, of course, handled by AWS so Stream can focus on building a fully managed chat and feed infrastructure for their customer base.
By outsourcing infrastructure that does not provide functionality that is core to your product, you can focus extensively on what makes your app or API unique. When the founders started Stream, they were heavily leveraging Cassandra for their workload. Cassandra was a good fit; however, after time, the performance and associated cost proved to not be ideal for the high demand that the Stream platform sees on a day to day basis.
For the past two years, Stream has been running a custom-built database dubbed “Keevo” which runs on top of EC2. The technology is based primarily on Go, RocksDB, Raft, and relies on the blazing fast NVMe storage engine.
In addition to the core products for feeds and chat, Stream offers AI-based moderation within their chat offering. To accomplish moderation at scale, the moderation stack is powered by a set of workers running a Python (Flask) API, hosted directly on EC2 T4 instances. To distribute traffic among the individual workers running on the T4 instances, application-level load balancing (HTTP) is used. Amazon RDS running PostgreSQL is used to store the original text, as well as the results kicked back from the AI-based moderation.
In the future, Stream would like to migrate much of its environment to leverage AWS MSK (Apache Kafka) to set up, scale, and streamline the process of managing moderation clusters in production.
NVIDIA T4 GPUs
At the core, Stream uses a BERT style Neural Network for text classification. This particular model has 110+ million parameters, which can make it tricky to run inference within a strict response time.
For training, Stream has custom-built desktops equipped with GTX 1080s taking advantage of PyTorch’s new JIT tracing capabilities. Inference speed was around 7ms when batched per sentence and 760ms for batching up to 256 sentences at once.
As Stream was somewhat happy with this speed, they then tried to deploy their models on AWS’ P2 instances, which turned out to be a bit of a disappointing experience. With everything configured the same, Stream saw inference speeds of 60ms for a batch of 1 and over 7 seconds for a batch of 256. Unfortunately, this was not suitable for Stream’s production environment so they moved to a P3 instance type. However, with the P3 instance costing ~$3.06/hr. (for on-demand pricing) and what Stream was trying to accomplish, they ran into an issue when it came to providing a cost-effective solution to their customer base.
Needing something faster than the K80’s on the P2 instance, and more cost-effective than the P3 instance type, AWS’ new G4 EC2 instances with NVIDIA T4 GPUs were the perfect solution. While not as fast as the custom-built computer’s GPU, Stream saw inference speeds that were quite astonishing. It took 8ms for a batch of 1 and just over 1 second for a batch of 256 – this was fast enough for their real-time and batch processing moderation needs.
Upon discovering the NVIDIA T4 GPUs, Stream quickly reached out to our team here at AWS. The NVIDIA T4 GPU offered by AWS provides an entirely new landscape for companies like Stream who need high performance, highly available, and high-speed GPU processing for workloads such as real-time AI-moderation. Stream is proud to say they are more than happy to have their hands on the new NVIDIA T4 GPU, as it’s cost-effective and efficient.
Stream believes that making the onboarding process your primary goal is key to winning the market. For Stream, that key is the developer experience – they go above and beyond to ensure that developers understand and feel comfortable with their product offerings. Stream has created beautiful tutorials for developers, such as their interactive React Chat Tutorial, a React Native Chat Tutorial, iOS/Swift Chat Tutorial and a Java/Kotlin Chat Tutorial for Android.
All of this is, of course, a decent amount of work; however, the hard work has paid off. Stream’s enterprise customers are starting to migrate from in-house and legacy third-party platforms to the feeds and chat products that their company provides.
One of the biggest debates for startups, and even more mature companies, is how much to build in-house vs. how much to accomplish by leveraging third party services. Stream has always believed that, for functions that are not a core piece of your infrastructure, it is best to leverage third-party services so you can focus your development efforts on what matters most to you and your organization.
Stream looks forward to continuing their partnership with AWS, leveraging more of the current functionality – as well as new offerings as they become available. Doing so will enable the engineering team to focus on building out the most valuable features that Stream provides.