AWS Partner Network (APN) Blog
How Validus Built a Bloomberg Real-Time Market Data Integration on AWS in a Week
By David Steiner, Principal Engineer – Validus Risk Management Ltd.
By Fergus Strangways-Dixon, Lead Platform Engineer – Validus Risk Management Ltd.
By Nihilson Gnanadason, Sr. Solutions Architect – AWS
Validus |
The increase in market uncertainty over the last couple of years has demonstrated the need for market participants to have a dynamic view of the risks and impacts due to market moves on their portfolios. This requires frequent and timely market data updates for the view to be accurate.
Validus Risk Management is a leading independent technology-enabled financial services firm that advises and provides solutions to institutional investors and fund managers with a focus on alternative, private, and illiquid assets. Validus delivers effective and efficient risk management solutions to clients globally, mitigating market risks such as foreign currency, inflation, or interest rate exposures—offering a comprehensive set of risk management tools through its award-winning technology platform, RiskView.
Historically, risk analytics at Validus have been powered by data from Bloomberg, an AWS Partner and global leader in business and financial information, news, and insights. The data used was delivered through Bloomberg’s REST API for cloud-native market data in a request-response based manner.
While Bloomberg’s REST APIs and cloud-native access to license data in the cloud meet many use cases, Validus’ risk management requirements have evolved such that streaming data is now required.
To meet this streaming need while also leveraging cloud benefits, Amazon Web Services (AWS) offers delivery of Bloomberg’s highly-performant real-time market data feed, B-PIPE. Additionally, the AWS connectivity model uses Bloomberg’s Open API (BLPAPI), which can provide consistency and resiliency.
B-PIPE supplies real-time access to comprehensive, consolidated market data with normalization and intelligence, including access to 35 million instruments, over 330 exchanges, and 80 billion ticks a day in an event-driven architecture through AWS PrivateLink.
Traditionally, integrating a market data feed can take months and requires a large corporate infrastructure with operationally complex components, such as Apache Kafka. With managed services on AWS combined with the easy setup and integration of B-PIPE through AWS PrivateLink, this is no longer the case.
In this post, we explore how Validus, leveraging B-PIPE and AWS PrivateLink, was able to implement a Proof-of-Concept (PoC) feed in a week, empowering their team to scale to production loads. The proposed solution propagates real-time updates from Bloomberg, all the way to the React frontend of the RiskView platform.
Architecture Overview
The design in Figure 1 demonstrates how AWS serverless solutions combined with B-PIPE delivered through AWS PrivateLink allowed Validus to rapidly implement real-time market data to meet both business and operational requirements.
Choice of Technologies
The architecture and technologies were selected to meet the following requirements:
- Ingested data needs to be streamed to other services to drive event-driven use cases. For current use cases, throttling to one update per second for each ticker is acceptable.
- Market data needs to be readily available and quickly queryable for five days, as calculations and frontend graphs depend on this.
- Data needs to be persisted long-term for ad hoc historical queries.
Other deciding factors include infrastructure and operational costs, speed of development, and scalability for future use cases. These factors naturally led to a choice of managed, serverless AWS services.
The core of Validus’ solution is Amazon DynamoDB, which doubles as a streaming service using DynamoDB Streams. Real-time communication with the frontend—another feature that is traditionally hard to implement and scale—is achieved using a WebSocket API in API Gateway.
For compute, a combination of AWS Lambda and AWS Fargate is used.
Figure 1 – Top-level architecture.
Market Data Ingest Service
Bloomberg provides client libraries for B-PIPE in several common programming languages, including C++, C#, Java, and Python. Using the client library, a persistent connection can be established through AWS PrivateLink.
Highly ephemeral compute, such as AWS Lambda, is not suitable for this service due to the long-running connection to Bloomberg. AWS Fargate offers a good middle ground.
The ingest service is a configurable component for throttling writes to DynamoDB depending on the requirements. With the elastic compute provided by AWS Fargate and B-PIPE’s high frequency capabilities, this can be scaled to any future requirements. Currently, this is set at one aggregated update per second, per ticker containing some aggregated metrics from all ticks, such as the minimum, maximum, and average of values.
The aggregated ticks are stored on DynamoDB. There are around 3,000 tickers relevant to the trade population at Validus. This results in a maximum of 3,000 writes per second.
Amazon DynamoDB as a Streaming Service
DynamoDB with DynamoDB Streams provides a solution for short-term storage and streaming that is cost-effective and low maintenance. The access patterns for the short-term storage are well-known and the data is straightforward, making the DynamoDB model fairly easy.
The ticker (for example, EUR/USD spot rate) is a natural partition key and the timestamp is a suitable sort key. The data only needs to be kept for five days. The TTL (time to live) functionality of DynamoDB is a convenient way of cleaning up after five days.
Figure 2 – Ticker items in DynamoDB with the ticker name (n) and timestamp (t) as primary key. Other fields represent ask (a), bid (b), and last price (lp).
DynamoDB provides two options for streaming the change data capture (CDC): Kinesis Data Streams for DynamoDB and DynamoDB Streams. The number of shards and consumers per shard are an important aspect of streaming performance. DynamoDB streams supports two consumers per shard, which was sufficient for the throughput requirements.
For higher throughput requirements, consider using Kinesis Data Streams that supports a maximum of five consumer processes per shard or up to 20 simultaneous consumers per shard with enhanced fan-out.
If querying is not required or a bespoke timeseries database is already in place, a dedicated streaming solution can be used. The industry-standard streaming platform is Apache Kafka, which would definitely be appropriate in this case; however, it comes with higher operational overhead.
Amazon Kinesis is a great alternative high-performance managed streaming service with useful integrations through Amazon Kinesis Data Firehose to Amazon Simple Storage Service (Amazon S3) and other services.
Long-Term Storage
To facilitate the third requirement, data is also persisted on Amazon S3 and queried through Amazon Athena. Amazon S3 and Amazon Athena support efficient data formats such as Parquet and Avro, allowing flexibility while keeping costs low.
Most streaming services have excellent integration with Amazon S3, with minimal work required to begin long-term storage of data.
Broadcaster Service
The broadcaster is the core service responsible for supplying frontend consumers with updates, which includes the three following sources of input:
- Semi-static internal data, such as trades, coming from the existing services.
- Market data, coming from the market data feed.
- Subscribers, which are stored in DynamoDB by AWS Lambda processing WebSocket API subscription messages.
Whenever a new market data update ticks, the appropriate values are re-calculated. The service then looks at the list of subscribers to figure out which subscribers are interested in the update. The list of subscribers is stored in DynamoDB, keyed by the connection ID with a set of ticker names.
Figure 3 – A DynamoDB item representing a subscriber.
Validus aims to support a few hundred active subscriptions, for which a periodic full scan of this data is sufficient. If a higher number of connections needs to be supported, maintaining a cache of this data in the broadcaster service is necessary. This cache can be kept up-to-date with new subscription events from DynamoDB.
Updates can be pushed to the frontend by posting to the connections through the WebSocket API in API Gateway.
If a GraphQL-based solution is preferred, AWS AppSync may be considered instead of API Gateway. This supports real-time GraphQL subscriptions and removes the need to manually manage the list of subscribers.
Performance Metrics
The volume and velocity of data required at each stage of processing can have a significant impact on design decisions. Ingesting 3,000 tickers aggregates the real-time flow of each to one event of roughly 200 bytes per second. That’s 36 MB per minute or 51.84 GB per day.
Python data-ingest agents hosted on Amazon Elastic Container Service (Amazon ECS) on AWS Fargate are able to handle upwards of 300 tickers per vCPU, depending on the activity in the feed.
These agents insert an aggregated batch to DynamoDB at the end of each second, with latencies between 50-100ms. Benchmarking peak data flow was aided by Datadog, supplying real-time container metrics while increasing the number of tickers assigned to an agent. Validus was able to query the average vCPU required per 100 tickers using a custom Datadog query.
A multi-CPU Amazon ECS hosted on Fargate central broadcast server written in Kotlin consolidates shards of the DynamoDB streams and broadcasts these to a peak of 500 consumers within one second of the write to DynamoDB. End-to-end P99 latency is less than three seconds.
Development Environment
Bloomberg sets up a development and two production AWS PrivateLink endpoints. To connect to the development endpoint, either set up a remote development environment within the same Amazon Virtual Private Cloud (Amazon VPC) or use a client VPN endpoint associated with the Amazon VPC and connect from your local device.
To ensure reproducible, reviewable infrastructure, it is strongly recommended to use an infrastructure-as-code tool. AWS Cloud Development Kit (AWS CDK) works well for this. Other popular tools include Terraform or AWS CloudFormation.
AWS CDK greatly increases deployment speed because when the service is ready to deploy to production infrastructure, only slight tweaks are required to the configuration of the development CDK code to cater to higher load and settings.
It also allows for rapid and traceable experimentation and benchmarking of the infrastructure when changes are as easy as merging a pull request and allowing the CI/CD pipeline to deploy them.
Summary
Access to real-time market data is no longer the luxury of large corporations. Using Bloomberg’s cloud native real-time market data solution and managed AWS services, a single developer can get a PoC feed up and running end-to-end in days, without the support of an infrastructure team.
With the entire infrastructure described in AWS CDK and with the scalability of the services used, it is very easy to promote the development feed to a production feed—making use of the two production endpoints provided by Bloomberg.
The result is a cost-effective and performant feed with the ability to power frontends with thousands of updates a second. Once the performance characteristics are better understood, further cost reductions are possible by committing to reserved compute and DynamoDB capacity.
In this post, we explored how Validus built a Bloomberg real-time market data integration using serverless managed services on AWS. With this implementation, Validus is accelerating market data integrations for its customers.
To learn more about Validus, visit the website.
To learn more about Bloomberg B-PIPE on AWS, read here.
Bloomberg – AWS Partner Spotlight
Bloomberg is an AWS Partner and global leader in business and financial information, news, and insights.