Amazon Timestream: 2021 in review
Amazon Timestream is a purpose-built time series database service you can use for IoT data collection, application health and usage monitoring, real-time analytics, and network performance monitoring. Timestream is fast, scalable, and serverless, making it easy and cost-effective to store and analyze trillions of events per day. Since its general availability in 2020, Timestream has been adopted by thousands of customers across various industry verticals. Customers have requested more capabilities in Timestream to make it easier, faster, cheaper, and more secure to store, process, and analyze all of their time series data.
In 2021, we worked to improve Timestream through the launch of updates broadly categorized into three areas: faster and more cost-effective time-series data processing, improved ease of use, and strengthened security and compliance. Additionally, Timestream became available in a new Region: Europe (Frankfurt). This post covers some of those new features and improvements, and describes use cases and benefits.
Faster and cost-effective time series data processing in Timestream
Timestream added three new capabilities: multi-measure records, scheduled queries, and magnetic storage writes, which make time series data processing faster, more cost-effective, and easier to access for many additional customers. These features enable you to write, store, and access your time series data more efficiently, so you can continue to derive insights from your data and make better data-driven business decisions.
The multi-measure record is a new data modeling feature that improves data write throughput, optimizes data storage, and simplifies querying. Previously, data points were limited to a single measure name and value per table record. For example, application monitoring data containing the measures CPU, memory, and disk IOPS was previously stored in three separate records, as illustrated in the following table.
Now, you can store multiple measures from the same source at a given timestamp within a single record. As a result, query performance is improved for applications where multiple measurements are emitted.
With multi-measure records, you can batch more data in a single write request than in single-measure records. As a result, data write throughput increases while ingestion and storage costs are reduced by eliminating duplicated dimensions. For example, three single-measure records with the repeating host name
host-24Gju are now only charged one time in a multi-measure record.
The smaller memory and magnetic storage footprints provide a storage cost reduction of up to ten times. Query latency also improves with up to four times faster processing when using multi-measure records. Additionally, migration from existing relational data stores to Timestream can now occur with minimal to no changes to the schema.
You can use multi-measure records for any time series application that generates more than one measurement from the same device at any given time. Some example applications include a video streaming platform that generates hundreds of user metrics; IoT devices that generate measurements such as temperature, blood oxygen levels, or heart rate; and applications that generate metrics for monitoring performance and availability of web or mobile infrastructure.
We also introduced scheduled queries into Timestream, a fully managed, serverless, and scalable solution for calculating and storing aggregates, rollups, and any other SQL pre-computations on a recurring basis. Scheduled queries make real-time analytics more performant and cost-effective, so you can derive additional insights from data, and can continue to make better business decisions. These queries are used to power frequently accessed real-time operational dashboards, business reports, applications, and device-monitoring systems.
With scheduled queries, you simply define the queries, target the incoming data, and set a schedule for performing the computation. Timestream then periodically and automatically runs these queries and reliably writes the query results into another Timestream derived table. You can then point your dashboards, reports, applications, and monitoring systems to simply query the derived table populated by scheduled queries, instead of querying the considerably larger source tables containing the incoming time series data. Because the data retention of the derived tables is fully decoupled from that of source tables, you can also choose to reduce the data retention of the source tables and keep the derived data for a much longer duration, at a fraction of the data storage cost.
You can use scheduled queries to power business reports, create systems for detecting anomalies, or govern data access by providing teams access to only the tables populated by scheduled queries. For more information about usage patterns as well as end-to-end examples, see Scheduled Query Patterns and Examples.
Magnetic store writes
Timestream offers data storage tiering and supports two storage tiers: a memory store and a magnetic store. With magnetic store writes enabled, you can write data asynchronously and durably to Timestream. Based on the data’s timestamp and preconfigured data retention window, the service automatically determines whether the data gets written to the memory store or to the magnetic store. This improvement provides you with a cost-effective solution for ingesting any data that has a timestamp outside of the table’s memory store retention. For example, if you retain data in the memory store for one day, you can use magnetic store writes for ingesting late-arriving data from IoT devices that experience connectivity interruptions longer than one day.
Magnetic store writes eliminate the need to maintain a memory store with a large data retention period for the purpose of processing late-arriving data and, as a result, lower storage costs. To ensure data is up-to-date for analytics, you can upsert records stored in magnetic store, in addition to records in a memory store. The upsert operation updates an existing record or inserts the data as a new record. You can use the magnetic store for asynchronous processing of late-arriving data, long-term data storage, and for fast analytical queries.
Query performance improvements
We made great strides in Timestream’s query performance through several optimizations. Our customers generated quicker insights and actions that helped them gain an advantage in today’s competitive market. Some of our customers monitor and track metrics of streaming video sessions over hundreds of millions of subscribers, resulting in billions of time series metrics. Other workloads benefit from the serverless scaling and low costs when tracking metrics for devices with sporadic activity. Operating a service with this diversity provides us the opportunity to analyze the ingestion, data distributions, and query patterns for thousands of workloads over time, which in turn provided us the insights to optimize the system to improve performance.
By analyzing these diverse workloads, in conjunction with the query access patterns and query concurrency requirements, we improved our adaptive auto scaling, partitioning, indexing, and resource allocation algorithms. These optimizations resulted in up to three times faster query response times across a variety of workloads. The best part is that because Timestream is a serverless offering, all these improvements were automatically and seamlessly rolled out to our customers without any additional effort!
Additional Timestream features that improve ease of use
Throughout 2021, it became easier to get started and use Timestream to generate business insights with minimal effort. We simplified the service through increased quotas, enabling querying across multiple tables, and expanding query capabilities for accurate performance and cost monitoring.
Customers benefited from increased quotas on database, table, and measure names, which can now be created as large as 256 bytes each. Additionally, tables can support up to 8,192 unique measure names (increased from 1,024). With increased limits, you have greater flexibility to address workload needs, and migration to Timestream becomes easier. The naming guidelines for dimensions are also relaxed with the addition of UTF-8 support.
Additionally, the querying capabilities of Timestream improved by allowing cross-table queries where you can run JOIN or UNION statements on data from multiple tables or databases in Timestream. These queries enable analytics and computation across tables and allow you to combine data in various ways. For instance, you can easily combine data from a scheduled query with the latest streaming data for powerful, real-time analytics.
Querying also improved with the introduction of advanced time series functions. To better understand trends and patterns across time series data, identify the rate at which the data changes over time, and recognize the degree of similarity between related data, you can run SQL queries with advanced time series functions such as derivatives, integrals, and correlations, and queries with joins and unions across multiple tables.
Also, we enhanced the query API. This now provides access to the amount of data scanned by a query, so you can estimate the cost of queries. You can also identify how far along a query is in its run process, so you can identify and cancel long-running queries. These query statistics are available as part of the Query API and can also be accessed through the Timestream console.
Strengthened security and compliance in Timestream
Since our release of Timestream, security and compliance has been a foundational tenet for us to earn trust and deliver value to all of our customers who rely on us every day to gain business insights from their data. Our customers span all industries and sizes, from startups to Fortune 500 companies, and we work tirelessly to ensure each of your security and compliance needs are met. In 2021, we made it possible for you to use Timestream from your VPCs and within applications that are subject to System and Organization Control (SOC) compliance.
You can now access Timestream APIs from your VPC using Amazon Virtual Private Cloud (Amazon VPC) endpoints. Amazon VPC endpoints are easy to configure and provide reliable connectivity to Timestream APIs without requiring an internet gateway or a Network Address Translation (NAT) instance.
Timestream is in scope for AWS’s SOC 1, SOC 2, and SOC 3 reports, allowing anyone to get deep insight into the security processes and controls that protect customer data. AWS SOC reports are independent third-party examination reports that demonstrate how AWS achieves key compliance controls and objectives. The purpose of these reports is to help teams understand the AWS controls established to support operations and compliance. In addition to meeting standards for SOC, Timestream is compliant with and can be used for workloads that are subject to Health Insurance Portability and Accountability Act (HIPAA), International Organization for Standardization (ISO 9001, 27001, 27017, and 27018), and Payment Card Industry – Data Security Standard (PCI DSS). For a full list, refer to AWS Services in Scope by Compliance Program.
AWS has comprehensive security capabilities to satisfy the most demanding requirements, and Timestream provides data security out of the box at no extra cost. Our team will continue our commitment to keeping Timestream secure for all customers.
When it comes to making it easier, simpler, and faster for you to analyze all of your time series data, velocity matters, and we are innovating at a rapid pace to bring new capabilities to Timestream. We look forward to continuing the momentum in 2022 and seeing how you will use these capabilities to innovate with time series data! Visit the Developer Guide, watch the video tutorial, learn our best practices, and read about supported integrations with Timestream to get started.
About the Authors
Praneeth Kavuri is a Senior Product Manager in AWS working on Amazon Timestream. He enjoys building scalable solutions and working with customers to help deploy, and optimize database workloads on AWS.
Igor Shvartser is a Senior Product Manager for Amazon Timestream. His fascination with data, working alongside customers, and building exceptional products has led him to AWS where he’s empowering teams with purpose-built databases. He currently resides in Los Angeles.