Autodesk Builds Unified Log Analytics Solution on AWS to Gain New Insights
Autodesk, a leading provider of 3D design and engineering software, wants to do more than create and deliver software. It also wants to ensure its millions of global users have the best experience running that software. To make that happen, Autodesk needs to monitor and fix software problems as quickly as possible. Doing this was challenging, however, because the company’s previous application-data log solution struggled to keep up with the growing volume of data needing to be analyzed and stored.
The solution ingests 2 TB of data every day, a number expected to grow to 10 TB within the next few years. “We had some performance issues with the solution, which made it difficult for us to detect problems quickly,” says Tommy Li, senior software architect at Autodesk. “We needed the ability to monitor logging-incident data in real time, so we could answer customer questions faster.”
Autodesk was also encouraged by its finance department to find a more cost-effective logging solution. “We have a small team, and we wanted to find a solution that would ease log data management while lowering costs,” says Li.
“Ultimately, we are improving our software products and offering better service to our customers because of the real-time visibility we’re getting into log data.”
Tommy Li, Senior Software Architect, Autodesk
AWS Services Used
Based in San Rafael, California, Autodesk is a software company that creates products for the architecture, engineering, construction, manufacturing, media, and entertainment industries. The company’s software includes AutoCAD and 3D solutions.
- Finds and fixes application problems faster through real-time data analysis
- Improves mean time to detect and recover
- Builds enterprise data logging analysis solution using small IT team
AWS Services Used
Creating a Fully Managed Unified Log Data Solution on AWS
To improve its log analysis capabilities for root-cause analysis, Autodesk researched building a cloud-based unified log data solution leveraging Amazon Web Services (AWS). “We had already been using AWS services for various internal functions at Autodesk, and we wanted to expand on that usage by developing a unified logging system,” says Li.
Amazon Kinesis Data Firehose acts as the data transport layer for logging data, and Amazon Managed Service for Apache Flink is used to uncover real-time monitoring metrics such as response time and error-rate spikes. Once logging data goes through this pipeline, it is sent to Amazon CloudWatch for additional metrics that are displayed in standardized dashboards across the business. These metrics include overall traffic summary, including response time, errors, and total requests; API metrics, such as response time percentiles, number of success requests, and number of error requests; CPU, network, and disks for each CPU; and Amazon CloudWatch metrics for AWS services.
At the same time, Amazon Kinesis Firehose delivers the log data to Amazon OpenSearch Service, a managed service that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. “[Amazon OpenSearch Service] enables data forensic activities to take place and help find and fix application problems faster,” Li says. Amazon Athena provides more in-depth interactive analytical querying and AWS X-Ray delivers tools for analyzing trace data. Additionally, the Kibana open-source data visualization tool – integrated with Amazon OpenSearch Service – drives dashboards to monitor data in real time.
Finding and Fixing Problems Faster Than Before
The unified logging solution built on Amazon OpenSearch Service provides better visibility into data logs more quickly. “[Amazon OpenSearch Service] enables a more consistent way to collect and measure logging data in real time,” says Li. “This service gives in-depth data analysis that enables better correlations between logging events, providing answers to application problems faster.” For example, Autodesk teams created dashboards that identify trends and patterns for anomalies that can help quickly correlate to detailed log records for detailed forensics.
Specifically, Autodesk is enabling better forensic analysis, using instrumentation data to detect and resolve errors to improve overall mean time to recover. The company can detect API usage anomalies like error rate and response time spikes. Amazon CloudWatch alert events shorten the mean time to detect and the time to call incident response teams.
The company is also looking to derive deeper insights from its analytical data to improve its software and customer service. “Ultimately, we are improving our software products and offering better service to our customers because of the real-time visibility we’re getting with log data,” says Li.
Working with AWS, Autodesk is building highly scalable log analytics capabilities that reduce the overall cost of the solution.
Breaking Down Data Silos
By offloading the management of its architecture and Elasticsearch clusters to AWS, Autodesk was able to easily build its unified logging solution. “Even though our IT team has only a few people, we can develop and maintain a powerful logging solution by letting AWS take care of the technology,” Li says. “As a result, we no longer need to put our resources into managing the underlying infrastructure, and we can scale the solution on demand to support the growing volume of logging data.”
Autodesk now has a solution that provides a single-pane-of-glass view of logging data such as application performance and downtime.
“We no longer have data silos because of different teams using logging solutions,” Li says. “Everyone can access the same view with the AWS solution, which means everyone receives updated insights into the overall status of the platform. And using Kibana dashboards, we can create a common vocabulary we can all use to diagnose problems. Overall, as a company we can take a more unified approach to finding and fixing problems.”