Autodesk Reduces Big Data Processing Cost by 90% using AWS

2020

Autodesk is a leading software provider in 3D design for architecture, engineering, manufacturing, media, and entertainment industries. Over 100 million people worldwide use Autodesk products, which includes Computer-Aided Design and Building Information Modelling software.

To keep pace with an expanding user base, Autodesk embarked on a journey with Amazon Web Services (AWS) to revamp the Autodesk Data Platform (ADP), a data warehouse that generates detailed insights and analytics on product usage. Autodesk relies on the metrics derived by the ADP—such as active users, user adoption, product versions, and more—to improve product performance and identify new opportunities to better serve its customers.

Autodesk logo sculpture
kr_quotemark

Thanks to AWS, we have exceeded our targets. We reduced costs by up to 90 percent and enhanced analytics for business users with near real-time data processing.

Deanne Marie Lim
Senior Data Engineering Manager

Defining a Data Platform for the Future, Today

In August 2019, the ADP was receiving data at a rate of around 150 GB an hour, which was 50 GB more than what it could handle. Autodesk anticipated data volume to burgeon quickly due to two key factors—Autodesk was adding more software subscribers and each new release of its desktop products generated more product usage data.

“We forecasted that a huge volume of data would be coming onto the ADP and needed improvements. We needed to enhance our existing system to process an incoming data surge of up to 10 times more than 150 GB—the highest data volume the platform has received. Our target was also to achieve an improvement in speed of data processing and availability to one hour, and reduction of processing costs by 70 percent.” says Deanne Marie Lim, senior data engineering manager at Autodesk, Asia Pacific.

Prior to the transformation project, Autodesk was running its data platform on a Spark-based system on AWS. The Autodesk team observed product usage on an hourly basis, using Amazon Elastic Compute Cloud (Amazon EC2) to schedule hourly cleansing and processing of the raw data. However, the platform would frequently fail when data volumes spiked to more than 100 GB in a given hour, due to its limited capacity for horizontal scaling. As a result, the team would have to manually rerun the job, incurring additional costs while slowing the team’s efficiency.

Furthermore, after cleaning the raw data—which took up to four hours—Autodesk needed to perform extract, transform, and load (ETL) to present this data in dashboards. Since this cleansing and enrichment process produces the most upstream dataset that powers all other ETL and dashboards, any failure or delay impairs the company’s ability to identify opportunities for product improvements and other business development, in a timely manner.

The Transformation

“As part of the AWS Partner Network (APN), we had first-hand insight on how to leverage serverless solutions to improve our data platform. AWS provided valuable tools and technologies needed to address our current challenges. In particular, AWS Lambda and Amazon DynamoDB helped us to address the issues we had with our existing data cleansing and enrichment process. We developed a sustainable, scalable, and cost-effective solution within a year,” adds Lim.

AWS helped to shift the ADP from a batch processing model to an event-based model running on AWS Lambda. This allows Autodesk to process data as soon as it arrives, as opposed to a scheduled time, which resulted in delayed analytics. AWS Lambda automated ETL aggregations, and data is now processed in near real-time, compared to hourly cycles.

To enable high throughput, low latency and near real-time processing, Autodesk incorporated Amazon DynamoDB. With Amazon DynamoDB, the ADP handles thousands of concurrent requests within milliseconds. Additionally, Amazon Simple Storage Service (Amazon S3) was used to build a secure data lake and achieve high data availability.

Autodesk architecture diagram

The ADP can now automatically process more than six billion data events, equivalent to 12 TB of data, daily, and the data is available within two minutes upon data receipt, down from four hours previously.

Lim concludes, “Thanks to AWS, we have exceeded our targets. We reduced costs by up to 90 percent and enhanced analytics for business users with near real-time data processing. We can now make more meaningful improvements to the user experience. For instance, by tracking real-time data collected on past user activities, we can provide end users with recommendations on how to enhance the way that they are using our software. With AWS, we can now provide valuable insights back to our customers.”

Blueprint for the Future

The success of this implementation has spurred Autodesk on to continue working with AWS to apply event-based modelling for other product usage datasets. Looking ahead, Autodesk plans to evolve its data platform to support predictive analytics. This will allow the company to deliver an enhanced user experience by analyzing common usage patterns and providing real-time recommendations to improve the user journey.


About Autodesk

Autodesk is a leading software provider in 3D design for architecture, engineering, manufacturing, media, and entertainment industries. Over 100 million people worldwide use Autodesk products, which includes Computer-Aided Design and Building Information Modelling software.

Benefits of AWS

  • Reduced cost of big data processing by up to 90% a year
  • Enabled near real-time data processing, insights, and analytics
  • Improved performance capacity of up to 10 times more data
  • Processes over 350 million data events per hour, up from 72 million

AWS Services Used

AWS Lambda

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume.

Learn more »

Amazon DynamoDB

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It's a fully managed, multiregion, multimaster, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.

Learn more »

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

Learn more »


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.