nasdaq-200x70

Nasdaq Uses AWS to Pioneer Stock Exchange Data Storage in the Cloud

2020

Nasdaq is a multinational financial services and technology corporation that owns and operates the Nasdaq Stock Exchange. Nasdaq operates a total of 27 markets, a central securities depository, and clearinghouse across a variety of asset classes in North America and Europe. It is home to nearly 4,000 listed companies globally across its markets and also provides its mission-critical technology to other market infrastructure operators located in 50 countries.

The Nasdaq Stock Exchange is the largest equities franchise globally by volume, and it manages the matching of buyers and sellers at high volume and velocity, while providing data feeding the price quote for stocks in electronically entered trades. Nasdaq relies on an internal application to capture and store all protected exchange data. “This data includes orders, quotes, trades, and cancellations,” says Robert Hunt, vice president of software engineering for Nasdaq. Every night, Nasdaq receives billions of records that need to be loaded for billing and reporting processes before the markets open the following morning.

As automated trading platforms have entered the market, the pace and volume of transactions has grown. In 2014, to increase scale and performance and lower operational costs, Nasdaq moved from a legacy on-premises data warehouse to an Amazon Web Services (AWS) data warehouse powered by an Amazon Redshift cluster. Between 2014 and 2018, this Amazon Redshift cluster grew to 70 nodes as the company expanded the solution to support all its North American markets. By 2018, the solution ingested financial market data from thousands of sources nightly, ranging from 30 billion to 55 billion records and surpassing 4 terabytes.

Over time, growth in data led to a change in approach for managing that data for analytics. The overnight batch processing that runs against the warehouse caused challenges in processing enormous volumes to meet stringent deadlines. Users rely on the data to complete billing, reporting, and surveillance. “When market volatility increased in early 2018, data volumes for the warehouse grew substantially, peaking at about 55 billion records per day in 2018,” says Hunt.

More sophisticated trading practices lead to a massive growth in data and it was critical that Nasdaq started planning to evolve a new architecture to continue to achieve the performance standards and operational excellence that the ecosystem expects. “We have to both load and consume the 30 billion records in a time period between market close and the following morning. Data loading delayed the delivery of our reports,” says Hunt. “We needed to be able to write or load data into our data storage solution very quickly without interfering with the reading and querying of the data at the same time.”

Nasdaq Migrates Its Growing Data Warehouse to a More Modern Data Lake Architecture (2:25)
kr_quotemark

We were able to easily support the jump from 30 billion records to 70 billion records a day because of the flexibility and scalability of Amazon S3 and Amazon Redshift.”

Robert Hunt
Vice President of Software Engineering, Nasdaq

Using AWS Services for Flexibility, Scalability, and Performance

In 2018, Nasdaq chose to build the foundation of a new data lake on Amazon Simple Storage Service (Amazon S3), which enables the company to separate compute and storage and to scale each function independently. In traditional data warehouse deployments, scaling storage capacity often requires companies to scale compute capacity at the same time because the application and storage are tightly linked, with onsite hardware modifications needed for any change to the ratio of the two. “In addition to the flexibility that comes with separation of compute and storage, Amazon S3 has better scaling properties in terms of writing and reading large datasets at the same time,” Hunt says. “Amazon S3 gave us a solution that enables zero contention between data loading and querying processes.”

By integrating AWS Identity and Access Management (AWS IAM) policies, Amazon S3 also provides comprehensive access control across multiple AWS accounts. Additionally, Nasdaq uses Amazon S3 to store critical financial data and move it to Amazon S3 Glacier, where it can be archived at a lower cost. The company relies on the Amazon S3 Object Lock feature to further enable compliance.
 
In January 2019, Nasdaq attended an AWS Data Lab, where it worked with AWS Solutions Architects and analytics service experts who provided prescriptive architectural guidance to rethink how Nasdaq implemented data warehousing. In the four-day lab, Nasdaq reinvented how it delivers analytics by using Amazon Redshift as a compute layer. As a result, Nasdaq began using Amazon Redshift Spectrum, a feature that powers a lake house architecture to query data both in the data warehouse and in the Amazon S3 data lake. “We’re putting all the data that comes from our internally operated exchanges into Amazon S3 and Amazon Redshift Spectrum,” says Hunt. “That includes orders, cancellations, quotes, and trades. Those are turned into messages and archived in Amazon S3, and those messages drive our downstream billing and reporting surveillance processes.”
 
The new data lake contains 15 terabytes of data on Amazon S3, which Nasdaq can query in place without data loading immediately after writing data to Amazon S3. This provides minimal time-to-insight and enables the Nasdaq economic research team to conduct data analysis and run complex queries against the data. In addition, the company’s surveillance business team queries the data lake after receiving inquiries from the U.S. Securities and Exchange Commission (SEC).

What began as a performance-focused solution has become a multi-use data lake shared between teams, creating additional benefit for the business.

Scaling to Support 70 Billion Records a Day

With compute and storage scaling independently, Nasdaq can now flex its compute layer to support the volume of transactions, with the data lake built on Amazon S3 storage easily supporting data that continues to grow in volume and complexity. For example, market volatility spiked in late February 2020, at the beginning of the COVID-19 pandemic, and the solution scaled to support an ingest of 70 billion records daily—with a peak volume of 113 billion.
 
“We were able to easily support the jump from 30 billion records to 70 billion records a day because of the flexibility and scalability of Amazon S3 and Amazon Redshift,” says Hunt. “We kept up with the spike in data volumes and provided the necessary billing, reporting, and surveillance processes to support our obligations to the market.” Nasdaq can also easily and quickly scale its environment down to ensure there is no idle capacity when the market adjusts again.

Loading Market Data for Reporting 5 Hours Faster

Using its new lake house architecture based on Amazon S3 and Amazon Redshift, Nasdaq is reaching its 90 percent mark for data load completion 5 hours sooner than before. In addition, by optimizing its data warehouse, the company was able to run Amazon Redshift queries 32 percent faster. “These improvements helped us accelerate our billing and reporting processes,” says Hunt. “For example, we’re done ingesting data within an hour or two of the market closing, which gives us a head start on billing and reporting. This helps immensely when we’re dealing with the volume spikes we’ve seen recently, and it also helps us meet or exceed our deadlines for our internal customers.”
 
Over time, the Amazon S3 and Amazon Redshift data lake has enabled transformation at Nasdaq. “We're free to focus on our expertise in our industry to innovate for Nasdaq while relying on AWS to provide cloud expertise,” says Hunt. “Going forward, we will continue to take advantage of new AWS services and technologies as the market requires.”

About Nasdaq

Nasdaq, founded in 1971 and headquartered in New York City, is a multinational financial services corporation that owns and operates the Nasdaq stock market and eight European stock exchanges. The company is home to nearly 4,000 listed companies located in 50 countries.

Benefits of AWS

  • Ingests 70 billion records per day
  • Loads financial market data 5 hours faster
  • Runs Amazon Redshift queries 32 percent faster
  • Enables business transformation with shared data
  • Spurs innovation with additional use cases


AWS Services Used

Amazon Simple Storage Service

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Amazon Redshift

Amazon Redshift gives you the best of high performance data warehouses with the unlimited flexibility and scalability of data lake storage.

Learn more »

AWS Identity and Access Management

AWS Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely.

Learn more »

Amazon S3 Glacier

Amazon S3 Glacier and S3 Glacier Deep Archive are a secure, durable, and extremely low-cost Amazon S3 cloud storage classes for data archiving and long-term backup.

Learn more »


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.