AWS for Industries
Introducing Amazon FinSpace with Managed kdb Insights, a fully managed analytics engine, commonly used by capital markets customers for analysis of real-time and historical time series data
Today, AWS is announcing Amazon FinSpace with Managed kdb Insights, a new capability that makes it simple to configure, run, and manage kdb Insights on AWS. KX Systems’ kdb Insights, a high-performance analytics engine, is optimized for analyzing real-time and multi-petabyte historical time series data. Kdb Insights is widely used in capital markets to power business-critical workloads such as option pricing, transaction cost analysis, and back-testing.
Capital markets customers rely on running high-performance queries on petabytes of real-time and historical market data, from hundreds of sources, to gain a competitive advantage. These financial services firms have been seeking a more efficient way to deploy and run kdb Insights to enable them to scale with market volatility and business needs, maintain availability during critical market trading hours, and reduce the work required to operate these critical workloads.
Using Amazon FinSpace with Managed kdb Insights, financial institutions can now democratize access to their most demanding analytics workloads and extend their reach across front, middle, and back offices by readily incorporating and accelerating Python and SQL codebases. These changes provide an improved user experience and seamless integration with other AWS services, such as Amazon CloudWatch and AWS Identity and Access Management, while significantly reducing the operational costs of running kdb Insights by eliminating manual configuration, operations, and maintenance. Firms can now spend less time worrying about managing the underlying infrastructure that powers their analytics workloads. With kdb Insights running on the AWS cloud, capital markets firms can avoid large upfront infrastructure purchases with pay-as-you-go kdb compute and storage. They can configure automatic scaling and built-in high availability for kdb Insights applications to keep up with volatile market conditions, and to meet requests for new analytics capabilities from business teams in hours instead of months.
While developing this new capability, AWS worked with a cross-section of KX customers to verify that Amazon FinSpace Managed kdb Insights met a broad set of their use cases and to ensure it would be suitable for the migration of existing kdb applications to AWS. These customers plan to modernize their existing kdb infrastructure and use Amazon FinSpace Managed kdb Insights as the analytical foundation for their trading and quantitative research applications. By migrating to Amazon FinSpace with Managed kdb Insights, they can reduce their operational overhead and have greater scalability and agility when delivering kdb applications for their business.
How it works
Now, let’s take a look at how it actually works. Using the new Amazon FinSpace with Managed kdb Insights, customers can leverage the AWS Console or perform configuration as code via APIs, Cloud Formation, or Terraform to deploy their existing real-time stream processing and historical analytics kdb Insights code on AWS. They can set up Real-time Databases (RDB), Historical Databases (HDB), and Gateways (GW) along with the data used by these processes, and can migrate their existing kdb Insights workloads to Amazon FinSpace with Managed kdb Insights by utilizing the migration assistant tools to support deployment, testing, and benchmarking their application migration when moving their scripts, applications, and users.
Figure 1: common on-premise kdb Insights environment
In the diagram above we see a common on-premise kdb Insights environment that consists of multiple Q processes responsible for various components of the architecture. Data flows into the environment from a feed handler (FH), which a ticker plant (TP) is listening to. A real-time database (RDB) subscribes to the TP and captures in its database everything that the TP has published. At the end of the day, data is saved to a historical database on-disk. For queries, a gateway (GW) process handles queries for data from clients. Queries can be for historical data or real-time data. The GW is responsible for distributing the query to the appropriate DB process (real-time or historical) and can aggregate and combine the results.
Figure 2: using Amazon Managed kdb Insights to set up the kdb environment
In the diagram above, we see how this common environment can be set up using Amazon Managed kdb Insights, where customers no longer face the complexity of manual deployment of RDBs, HDBs, or Gateways, as that is now done via simple API calls. HDB clusters can be multi-Availability Zone (AZ) for resiliency and include auto-scaling to better handle high query loads. Data can be optionally cached with the HDB clusters as well, providing high-speed access to historical data. With managed storage, customers don’t have to manage changes to their databases, as database changes are atomically applied through the changeset feature in Manage kdb Insights and databases can be created as a result of any change that has been applied to databases in managed storage.
Use Case: Migrate an Existing Historical Database to Managed kdb Insights
To show how simple it is to use Managed kdb Insights, we will walk through the migration of a historical database from an on-premises kdb+ deployment to Amazon FinSpace with Managed kdb Insights. This is a simple three-step process: (1) create and populate a Managed kdb Insights database; (2) create a Managed kdb Insights HDB cluster for the database; and (3) connect to the HDB cluster to query its data.
Figure 3: (1) create and populate a Managed kdb Insights database; (2) create a Managed kdb Insights HDB cluster for the database; and (3) connect to the HDB cluster to query its data.
Step 1: Create and Populate a Managed kdb Insights Database
Using the AWS Console, you can quickly create a database and populate it from data on S3.
In the AWS console go to the FinSpace service page, select Managed kdb Insights, and then select your kdb Environment. The kdb Environments page shows everything about an environment.
To create a new database, select the Databases table and select “Create database”. Only a name is required to create a database, but you can also give a description and tags for it as well. For this example, we will name the database “my_db” and give it a short description.
Specify the name of the Database (ie “my_db”) and give it a short description. Then select “Create database” to create. With the database created, it will appear in the list of kdb Databases for the environment.
The kdb database exists but contains no data. Data is added to a kdb database through a mechanism called changesets. A changeset is a bundle of files that comprise changes to the kdb database’s data files. Changesets are applied atomically to the kdb database, and multiple changesets can exist for a kdb database which can also provide a historical record of changes to the kdb database. With Managed kdb Insights you have full backup and restore capabilities of your kdb databases.
To add data to the kdb database, select the database name from the list of databases. This will take you to the database details page, where you will see other details about the database such as its description, when it was created, and what changesets exist for the database.
Now select “Create changeset” to add data.
For this example, the historical data has been staged to an S3 bucket and we will create a changeset from that data. Locate the data on S3 by selecting “Browse S3.”
Browse to the location on S3 where the data will be ingested from. As you can see, the path ending in “hdb” has the expected file structure of a database partitioned by date.
Import the whole database by selecting the parent path (hdb) and selecting “Choose”.
With the S3 path chosen, select “Create changeset” to create the changeset and add this data to the database.
You will now see this changeset in the list of changesets for the database. The changeset status will start with “Creating” and show as “Success” once the data has been imported.
This completes Step 1.
Step 2: Create a Managed kdb Insights HDB cluster
First, select the “Clusters” tab from the Managed kdb Insights console on Amazon FinSpace.
To create a new cluster, select “Create Cluster.” This starts a wizard for creating a cluster in Managed kdb Insights. Customers can create Real-time Database (RDB), Gateway (GW), and Historical Database (HDB) clusters. For the database “my_db” we will create an HDB cluster of three nodes, in a single AZ, with a cache for the whole database, and deploy code with an initialization script that loads the database.
Let’s walk through the wizard to accomplish this.
Create Cluster Wizard
Step 1: Add cluster details
First, we will provide core cluster details.
Name the cluster, select cluster type “Historical Database” and have it deploy to a single AZ, and select the AZ.
Further down the page, size the cluster as three nodes of type kx.s.4xlarge, set the port to 5000 (where kdb clients will connect to the kdb cluster) and select “Next”.
Step 2: Add code
Now we will identify what code (if any) to deploy; for this example, we will deploy a library of files that includes an init script that loads the database into the HDB cluster’s memory.
Specify the init scripts (code/init.q), arguments for the init script (dbdir), and the kdb+ command line option (‘s’) for setting the number of secondary threads.
When all are entered, select “Next”.
Step 3: Configure VPC settings
Next we will provide the VPC settings for the cluster to create a VPC endpoint so you can connect to the cluster without leaving your private AWS network. We will need some information about where to place that endpoint in the VPC, which subnet and security group.
Select the VPC, Subnet(s), and Security groups from your account. When all are entered, select “Next”.
Step 4: Configure data and storage
Next we will specify which kdb Database will be loaded into this new cluster. Kdb Clusters are built to service queries for a specific database and version of that database.
Select the database “my_db” and the changeset that represents the most recent data you want available for querying (you could select an older changeset to see how the database appeared in the past). In our example, the database my_db only has one changeset, but the database could contain multiple changesets in history.
To improve query performance, we will also configure to “Enable Caching”, and select the cache the entire databases (everything under path ‘/’). However we could limit the caching to a subset to reduce the cost.
When all are entered, select “Next”.
Step 5: Review and create
The final step is to review and confirm the properties of the cluster and commence cluster creation.
If all looks correct, select “Create cluster” at the bottom of the page.
You can monitor the cluster creation process from the kdb Environments page, the cluster will appear in the list of clusters and will display its status. Cluster creation will take approximately 5-10 minutes to complete.
Once the cluster’s status is “Running” you can connect to it. This completes Step 2.
Step 3: Connect to the HDB Cluster
Now that the cluster has been started with the database we created and populated with data, connecting to the database is no different in ‘q’ from connecting to existing HDBs. Just find the connection information to the cluster, which is supplied through the AWS FinSpace CLII: get-kx-connection-string.
Using the AWS CLI, get the connection string from the returned json:
The string contained in “signedConnectionString” is a signed URL that can be directly used in ‘q’ with hopen, for example.
Summary
In this post, we introduced you to Amazon FinSpace with Managed kdb Insights, a new capability of Amazon FinSpace that provides customers with a fully-managed service on AWS for the most recent version of KX’s high performance time-series analytical engine. Kdb Insights is used by more than 180 capital markets’ firms to power business-critical analytics workloads, such as liquidity analysis, transaction cost analysis, risk management, and pricing.
Amazon FinSpace with Managed kdb Insights makes it quick and simple for customers to set up a capital markets data processing and analytics hub on AWS, while enjoying the benefits of a fully-managed cloud service: simple migration of on-premises workloads, auto-scaling up and down, and high availability (across multiple AZs and Regions) and run-time performance, even during the most critical business hours. Managed kdb Insights also supports the execution of the same custom KX scripts running on-premise and provides the same familiar interfaces.
Availability and pricing
Amazon FinSpace with Managed kdb Insights will be Generally Available starting the week of June 5 in the following AWS regions: US East (Ohio, N. Virginia), US West (N. California, Oregon), Canada (Toronto), and Europe (Ireland).
To get started, visit https://aws.amazon.com/FinSpace.
Disclaimer: Any discussion of reference architectures in this post is illustrative and for informational purposes only. It is based on the information available at the time of publication. Any steps/recommendations are meant for educational purposes and initial proof of concepts, and not a full-enterprise solution. Contact us to design an architecture that works for your organization.