Apify Powers Web Insights for Businesses, Cuts Cloud Costs by 25% Using AWS

2022

Apify, a Prague-based startup founded in 2015, has developed a web-scraping and automation platform and set of open source tools that help businesses improve their operations by collecting and analyzing large volumes of web data, and automating web processes. Apify has used AWS from day one, taking advantage of credits and training through the AWS Activate program. The company has grown rapidly and currently provides services to 1,000 organizations across 179 countries. It has also scaled to process 1,000 TB of data a month, reduced compute costs by 25 percent, and boosted the efficiency of its development team.

kr_quotemark

The support we received through AWS Activate for credits, training, and cost control was a key reason we chose AWS. It undoubtedly contributed to our early success.”

Marek Trunkat
Chief Technology Officer, Apify

Apify a fast-growing business founded in Prague in 2015. The company has developed a web-scraping and automation platform and set of open source tools that collect data from the web. Its customers use insights gained from this data to improve their strategies around anything from product pricing to customer sentiment. 

As a startup, Apify needed a reliable, cost-effective infrastructure that could easily scale as customer demand grew.  

By building its offering on Amazon Web Services (AWS), the company has grown rapidly and now provides services to 1,000 organizations in 179 countries. It has also scaled to process 1,000 TB of data a month, reduced compute costs by 25 percent, and boosted the efficiency of its development team. 

Scaling to Process 20 Million Jobs a Month Using Amazon EKS

Apify services are used by its customers to address a wide range of business issues. For example, gathering product price data to inform sales strategies, tracking consumers’ conversations about product features or bugs to enhance services, or automating the process of cancelling customer subscriptions.

The web-scraping tools integrate easily into business workflows, and the structured data output is then exported in any format, making it instantly readable by customers. “Our services help companies change strategies quickly, or even strategize on-the-fly in fast-moving markets,” says Marek Trunkat, chief technology officer (CTO) at Apify.

Apify’s approach means that it needs to process vast amounts of information fast, so customers can reliably access and analyze web data. It processes 20 million web automation jobs monthly using Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Kubernetes Service (Amazon EKS), a managed container service to run and scale Kubernetes applications in the cloud or on premises. It also stores about 1.5 billion analytical results monthly using Amazon Simple Storage Service (Amazon S3), an object storage service—as well as the fully managed, serverless, key-value NoSQL database Amazon DynamoDB, and in-memory caching service Amazon ElastiCache.

Launching with AWS Activate and Cutting Cloud Costs by 25%

Apify began using AWS through AWS Activate, a program that offers startups free tools and resources to get started on AWS. “The support we received through AWS Activate for credits, training, and cost control was a key reason we chose AWS,” says Trunkat. “It undoubtedly contributed to our early success.”

The company also managed its budget and reduced its total cloud costs by 25 percent using Amazon EC2 Spot Instances, which run fault-tolerant workloads at a discount of up to 90 percent. “In our highly competitive market, successful businesses are those that can take a distinctive idea and scale it at speed,” says Trunkat. “We were able to do this using Spot Instances and the AWS Activate program.”

Apify has grown to serve more than 1,000 active customers while operating with a small engineering and DevOps team. Using AWS, it can scale API throughput quickly from 100,000 to 500,000 requests per minute to meet dynamically changing customer demand.

Supporting a Worldwide Community of Developers

Apify boosted the cloud knowledge and efficiency of its IT team with AWS learning materials and immersion training days. Using AWS, the team spends minimal time on infrastructure maintenance and monitoring tasks. This frees engineers to use their cloud skills to develop innovative solutions and support other developers to create their own web-scraping tools. Apify and its community of developers have created 1,000 ready-to-use web-scraping tools that are available to customers through an online store.

The startup has even launched an initiative that allows its community to earn income from these tools. The goal is for developers to build their own automation tools, host them on Apify’s infrastructure, and then rent those tools to third parties. “Apify aims to become the leading platform and marketplace for web-scraping and automation tools,” says Trunkat. “Knowing how easy it is to scale on AWS and build reliable services for customers, we feel confident we can meet our growth ambitions.”


About Apify

Apify is a startup based in the Czech Republic that specializes in web scraping and automation tools. Apify products are used by 1,000 companies in 179 countries to automate and develop new services. It has 90 employees and offices around Europe.

Benefits of AWS

  • Scales API throughput from 100,000 to 500,000 requests per minute
  • Processes 20 million web automation jobs monthly
  • Reduces cloud costs by 25% using Amazon EC2 Spot Instances 
  • Cuts staff time spent on IT maintenance 

AWS Services Used

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

Learn more »

Amazon DynamoDB

Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data export tools.

Learn more »

Amazon Lambda

AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can trigger Lambda from over 200 AWS services and software as a service (SaaS) applications, and only pay for what you use.

Learn more »

Amazon ElastiCache

Amazon ElastiCache is a fully managed, in-memory caching service supporting flexible, real-time use cases. You can use ElastiCache for caching, which accelerates application and database performance, or as a primary data store for use cases that don't require durability like session stores, gaming leaderboards, streaming, and analytics.

Learn more »


Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.