AWS Database Blog

Designing high-performance time series data tables on Amazon RDS for PostgreSQL

Many organizations need to store time series data. Some organizations have applications designed to store and query large amounts of time series data such as collecting metrics from a fleet of internet of things (IoT) devices. Others may have a single table of time series data such as a transaction history table in an OLTP database. Storing time series data in Amazon RDS for PostgreSQL is the correct choice if it needs to be joined with transactional data already in PostgreSQL. It may also be the correct choice, if the data to be ingested can be batched so it can be loaded in larger groups. If the time series data is to be used in a standalone application or if the data will be loaded directly from many locations or sensors, a purpose-built database like Amazon Timestream would be a more suitable choice. In this post, we will focus on a workload well-suited for PostgreSQL.

By their nature, time series tables grow continuously. When these tables get large, splitting these tables into many smaller tables will improve database performance and make maintenance much easier. Splitting a table into multiple tables is called partitioning. When done correctly, the application accessing the partitioned table doesn’t notice any difference other than faster to access the data it requested.

Starting with version 10, PostgreSQL supports table partitioning. Recent versions of PostgreSQL added enhancements to indexing and partitioning that improve the performance of time series data. PostgreSQL 12 introduced the ability to add partitions without the need to take an exclusive lock on the parent table and changed the planning algorithm, so PostgreSQL can support thousands of partitions. PostgreSQL 13 builds on that capability. One of the key features in PostgreSQL 13 enabling this scale is an improvement in partition pruning. Partition pruning is the ability for a query to ignore partitions that would not contain data relevant to the query. Earlier versions of PostgreSQL were not as effective in pruning partitions requiring many queries to scan all partitions. Amazon RDS for PostgreSQL provides the tools to make it even easier to maintain your time series data using many partitions. In part of this blog, we will discuss the extensions pg_partman and pg_cron for maintaining highly partitioned time series tables. In most cases, PostgreSQL can handle an organization’s time series data ranging from one million metrics per year to more than one million metrics per second. However, not all tables will benefit from partitioning, and when not needed, it can degrade database performance. Improving performance through partitioning requires a few up-front considerations to design tables that handle time series data well.

From a database perspective, time series data exhibits a few key characteristics. First, time series data is treated as an immutable append-only log; data is inserted, not updated. Unlike data for a business application such as an inventory management system where rows are regularly updated, as time series data changes, new entries are added to the table. The original rows are treated as immutable, and new information is appended to the table. Old rows may be deleted if purging is needed, but they are not changed. Also, data is inserted in time order. There are exceptions, such as a sensor may be offline for a while which results in ingestion of older data, but even when that happens, the rows arrive in order for the missing period. Access is not as random as is typically seen in an OLTP database. This means there is some natural ordering to the data. Older data tends to be already in the table with most current data being inserted.

This blog is the first in a series where we will cover in detail handling time series data in PostgreSQL. In this post, we explore these considerations when designing the data model to hold time series data. We will start with discussing the correct data type choices for time series tables. Then we will explore some indexing considerations for time series tables. Finally, we will examine partitioning time series tables. Following the principles described in this post will produce a large performance gain over treating a time series table as on ordinary table. We use a simple benchmark tool to simulate the ingestion of metrics so we describe how PostgreSQL can natively handle time series data. The benchmark simulates a fictional company that tracks metrics on its fleet of trucks.

Tables overview

The main table storing our truck metrics is the readings table. In addition to the time the readings are collected, the table stores data points such as the latitude, longitude, elevation, and velocity of the truck. See the following table definition:

CREATE TABLE public.readings (
    time             timestamptz,
    tags_id          integer,
    name             text,
    latitude         double precision,
    longitude        double precision,
    elevation        double precision,
    velocity         double precision,
    heading          double precision,
    grade            double precision,
    fuel_consumption double precision,
    additional_tags  jsonb
);

CREATE INDEX readings_latitude_time_idx ON public.readings 
 USING btree (latitude, time DESC);

CREATE INDEX readings_tags_id_time_idx ON public.readings 
 USING btree (tags_id, time DESC);

CREATE INDEX readings_time_idx ON public.readings 
 USING btree (time DESC);

There are similarly structured tables to hold diagnostics of truck conditions, such as the fuel level and the status of the truck, which indicates if it’s moving or stationary and tags to hold metadata about the truck such as its model number and load capacity.

To simulate the ingestion of the metrics, the benchmark loads pre-generated data using the PostgreSQL COPY command in parallel threads. For this post, we use a little more than a year of data for 100 trucks. The data spans from January 1, 2020 to January 12, 2021. This accounts for 586 million rows of data with a data size of 133 GB.

The following statement shows the breakdown of database size across the tables and indexes:

=> SELECT relname, pg_relation_size(oid) as bytes, 
          pg_size_pretty(pg_relation_size(oid)) 
     FROM pg_class 
    WHERE relnamespace = 'public'::regnamespace 
    ORDER BY relname;
             relname             |    bytes    | pg_size_pretty 
---------------------------------+-------------+----------------
 diagnostics                     | 22226821120 | 21 GB
 diagnostics_fuel_state_time_idx | 15525781504 | 14 GB
 diagnostics_tags_id_time_idx    | 16172163072 | 15 GB
 diagnostics_time_idx            | 12988997632 | 12 GB
 readings                        | 34228625408 | 32 GB
 readings_latitude_time_idx      | 12058607616 | 11 GB
 readings_tags_id_time_idx       | 16003366912 | 15 GB
 readings_time_idx               | 12963602432 | 12 GB
 tags                            |    60825600 | 58 MB
 tags_id_seq                     |        8192 | 8192 bytes
 tags_name_idx                   |    22192128 | 21 MB
 tags_pkey                       |    14811136 | 14 MB
 uniq1                           |    48742400 | 46 MB
(13 rows)

When ingesting the metrics, we use an Amazon RDS for PostgreSQL db.r6g.2xlarge DB instance with io1 storage provisioned with 20,000 IOPS and only four workers loading data. The intent on the small number of workers is to keep the database server from being saturated. In a production deployment, a database needs to be available to serve queries in addition to ingesting the data. With this configuration, the data was loaded in 5,609 seconds (93.5 minutes) at a rate of 522,832 metrics per second.

Data type considerations

When designing the table structure, it’s important to understand the range of possible values and choose the data types that allow the largest (or smallest) possible value. It’s equally important to not over-provision and choose a data type that is much larger than is needed. By over-provisioning the data type, the overall row size is larger, which makes the whole table significantly larger and affects database performance.

Let’s first look at the id column of the tags table. The table tracks the metadata of the truck, and each row is the intersection of the physical truck and the driver. The current fleet has 100 trucks, so it may be possible that the data type of the id column can be a smallint using only 2 bytes of space, but when we consider the number of potential drivers over the lifetime of the data model the number of potential drivers that can drive the current and future trucks, the number of potential combinations can easily exceed the maximum value of 32,767 of a smallint. Looking at the maximum value of the other integer types, an int has a maximum value of 2,147,483,647 and a bigint can be up to 9.22E+18. Considering the business context of what the id column represents, which is limited in its potential growth of the fleet, 2 billion potential values of a 4-byte integer is the right data type. If the column represented consumer-based values such as mobile phones instead of large trucks, then a bigint is the more optimal data type.

Now, let’s look at the data types for the metrics. They are all originally created as double precision, which allows up to 15 decimal digits precision in 8 bytes of physical space. Considering that the metrics being collected such as velocity and elevation are whole numbers, 15 decimal digits is more than is necessary. As whole numbers, the maximum number of digits required for the velocity of a truck is three, and the maximum number of digits for elevation is five. Even when considering future sensors that may be more accurate, going from smaller integers to requiring 15 decimal digits is very unlikely. A more optimal data type given the metrics being collected is a real, which allows for six decimal digits precision in only 4 bytes.

We can change the data types of the columns on the readings table holding metrics that don’t need 15 decimal digits with the following statement:

ALTER TABLE readings 
      ALTER COLUMN elevation TYPE real,
      ALTER COLUMN velocity TYPE real,
      ALTER COLUMN heading TYPE real,
      ALTER COLUMN grade TYPE real,
      ALTER COLUMN fuel_consumption TYPE real;

The effect of the data type change across all of the tables is reducing the size of the database from 133 GB to 126 GB (-5.2%). This smaller size reduces that amount of data persisted to storage, but it also allows more rows to be held in cache, increasing performance. Running the ingestion process with the new data types reduces the time of the load by more than 2 minutes, from 5,609 seconds to 5,487 seconds, and increases the ingestion rate from 522,832 metrics per second to 534,517 metrics per second.

Index considerations

One of the most resource-intensive aspects of ingesting data is maintaining indexes. Indexes aren’t necessary for ingestion, but without them, queries on the data are unacceptably slow. Limiting the number of indexes to just those necessary can provide a good balance between ingestion rate and query performance, but choosing those indexes depends on the business requirements for the queries. Regardless of the necessary additional indexes on a time series table, by the nature of the time series data, there is an index on the time column. Many queries analyze records over a segment of time, so quick access to those rows is required.

The default index used by PostgreSQL is a BTree index. A BTree index is a fine choice as a general-purpose index type, but it’s fairly resource intensive to maintain and can grow substantially as the number of rows in the table increases. BTree indexes have some performance optimizations in PostgreSQL for right-leaning trees, which is indicative of time series data. The time column is always increasing so the newest rows are always placed on the right side of the tree.

One of the strengths of a BTree index is the ability to quickly return a single value, like an integer used as a primary key or an email address. To accomplish this, the index must maintain an entry for every row in the table. This is ideal for an OLTP type workload, but queries on time series data typically span a range of time, such as returning all the rows from the last 5 minutes or aggregating values from yesterday. A typical query doesn’t look up rows on an exact time. For time series data, it’s not necessary to track the value of every row’s time value, but rather it needs to track if the row falls within a range of times.

With the flexibility of PostgreSQL, there is an index type that fits the time series use case extremely well. The Block Range Index (BRIN) was designed for an access pattern like a time series workload. Instead of tracking each individual time value, a BRIN tracks the minimum and maximum time value over a range of pages in the table. Because a time series table has a natural correlation of time values to the physical pages (because newer rows are appended to the end of the table), a BRIN is very efficient.

The following statement create indexes using the BRIN method for the time columns on the readings table. We also set the value of pages_per_range to 32 instead of the default of 128. This setting controls the range size to track the minimum and maximum values. In the case of a smaller number of devices like the 100 we’re using, 128 pages can span nearly 20 minutes’ worth of data. That larger range isn’t selective enough for the typical queries we regularly run, so scaling the range down to 32 pages helps the query performance.

CREATE INDEX readings_time_brin_idx 
    ON readings 
 USING BRIN (time) 
  WITH (pages_per_range = 32);

With using BRIN indexes instead of BTree indexes for the time columns, the size of the database reduced from 126 GB to 101 GB (-19.8%). The size of both BRIN indexes are only 24 KB. These significantly smaller index sizes translate directly into better performance for metrics ingestion. Loading the metrics using BRIN indexes reduces the time another 12 minutes, from 5,487 seconds to 4,761 seconds, with an ingestion rate of 616,002 metrics per second.

The increased ingestion rate using BRIN is great, but the purpose of indexes is to help query performance. Let’s look at a simple query that exercises the indexes on the time columns. The following query counts the number of readings in the table for a single day using a BTree index. From the EXPLAIN plan, we can see that PostgreSQL uses an Index Only Scan and counts the 777,456 rows in 222 milliseconds:

=> EXPLAIN ANALYZE 
    SELECT count(*) 
      FROM readings 
     WHERE time BETWEEN '2020-12-25' AND '2020-12-26';
                            QUERY PLAN                                                              
-----------------------------------------------------------------------------
 Aggregate  (cost=43256.51..43256.52 rows=1 width=8) 
       (actual time=222.670..222.671 rows=1 loops=1)
   ->  Index Only Scan using readings_time_idx on readings  
                     (cost=0.57..41335.85 rows=768264 width=0)
                     (actual time=0.024..174.810 rows=777456 loops=1)
         Index Cond: (("time" >= '2020-12-25 00:00:00+00'::timestamp with
                     time zone) AND ("time" <= '2020-12-26 
                     00:00:00+00'::timestamp with time zone))
         Heap Fetches: 777456
 Planning Time: 0.079 ms
 Execution Time: 222.701 ms
(6 rows)

When examining the same query with a BRIN index in place, we can see the plan is a little different. In the following plan, the index readings_time_brin_idx is first scanned, returning 9,728 candidate blocks. PostgreSQL must make a second check to ensure that all the rows in the page match the condition of the query, and in that second pass, PostgreSQL removed 19,762 rows from the result. Even with the additional processing of a second check, the query was able to count the 777,456 rows in 176 milliseconds.

=> EXPLAIN ANALYZE 
    SELECT count(*) 
      FROM readings 
     WHERE time BETWEEN '2020-12-25' AND '2020-12-26';
                            QUERY PLAN                                                              
-----------------------------------------------------------------------------
 Aggregate  (cost=6054759.35..6054759.36 rows=1 width=8) 
            (actual time=176.459..176.460 rows=1 loops=1)
   ->  Bitmap Heap Scan on readings  
            (cost=1824.41..6052838.72 rows=768252 width=0) 
            (actual time=18.612..129.882 rows=777456 loops=1)
         Recheck Cond: (("time" >= '2020-12-25 00:00:00+00'::timestamp with
               time zone) AND ("time" <= '2020-12-26 00:00:00+00'::timestamp         
               with time zone))
         Rows Removed by Index Recheck: 19762
         Heap Blocks: lossy=9728
         ->  Bitmap Index Scan on readings_time_brin_idx  
            (cost=0.00..1632.34 rows=768329 width=0) 
            (actual time=18.540..18.540 rows=97280 loops=1)
               Index Cond: (("time" >= '2020-12-25 00:00:00+00'::timestamp 
           with time zone) AND ("time" <= '2020-12-26 00:00:00+00'::timestamp 
           with time zone))
 Planning Time: 0.080 ms
 Execution Time: 176.494 ms
(9 rows)

Not all queries perform better with BRIN indexes like in this example, but if the access pattern fits the strengths of BRIN, they perform comparability to BTree indexes.

Partitioning considerations

The overall metrics ingestion rate of native PostgreSQL is impressive; however, when we look at the details over the duration of the load, we see that the overall number hides a degrading behavior. The following graph shows the ingestion rate over time throughout the duration of the load. The load initially starts at a rate of more than 1.5 million metrics per second, but ends at a rate of 400,000 metrics per second.
The load initially starts at a rate of more than 1.5 million metrics per second, but ends at a rate of 400,000 metrics per second.

Here we can see two points of behavior changes. The first at about 300 seconds, with a sharp drop in ingestion rate. The second is at about 4,200 seconds, where the amplitude of the rate swings more sharply. The initial drop correlates with PostgreSQL performing the first checkpoint after the ingestion starts. After a checkpoint, PostgreSQL writes out the full pages the first time a page is modified through a process called full-page writes. These full-page writes are used by the crash recovery process to prevent data corruption. With this behavior, as the BTree indexes grow, additional levels are added to the tree, meaning more pages are touched to get to the leaf nodes of the indexes. The additional touched pages need to be fully written to the Write-Ahead Log (WAL) after a checkpoint, which increases the overall amount of data needed to be persisted to storage affecting throughout. The second drop corresponds to the size of the database exceeding the available memory of the instance. After that point, pages need to be retrieved from storage instead of memory, which causes a larger variability in throughput.

To mitigate the two degrading behaviors, the solution is to keep the active dataset and its corresponding indexes small. The nature of time series data is antithetical to that, but PostgreSQL can partition data, which solves this. Since PostgreSQL 10, PostgreSQL supports native partitioning, which is ideally suited to a time series table. Partitioning allows a single large table to be broken up into smaller chunks, allowing for the performance characteristics of a small table while appearing as a large table to the application. Although the first implementations of partitioning in PostgreSQL 10 struggled with some performance issues, the feature matured over each subsequent release to the point where in PostgreSQL 13, it scales nicely to many thousands of partitions and performs extremely well.

Although the first implementations of partitioning in PostgreSQL 10 struggled with some performance issues

When partitioning a table, we have the flexibility to decide on the column we use to split the table. In a time series table, we naturally use the time column. We can then decide on the method of partitioning. PostgreSQL can partition tables by hash, list, or range. Like we mentioned previously in the discussion of Block Range Indexes, time series tables typically work with the data over ranges of times, so range partitioning is a natural fit. Finally, we need to determine the size of the range of data contained in each partition. The ideal range size depends on a number of factors, like the rate of data collection and the typical query accessing the data, but a good starting point is one day of data per partition.

To create a partitioned table, we use the same command as a standard table, but also define the partitioning details. We also define the indexes as if it were a standard table. The following commands creates the readings table partitioned by time:

CREATE TABLE public.readings (
    time             timestamptz,
    tags_id          integer,
    name             text,
    latitude         double precision,
    longitude        double precision,
    elevation        real,
    velocity         real,
    heading          real,
    grade            real,
    fuel_consumption real,
    additional_tags  jsonb
) PARTITION BY RANGE(time);

CREATE INDEX readings_latitude_time_idx ON public.readings 
 USING btree (latitude, "time" DESC);

CREATE INDEX readings_tags_id_time_idx ON public.readings 
 USING btree (tags_id, "time" DESC);

CREATE INDEX readings_time_brin_idx ON readings 
 USING BRIN (time) 
  WITH (pages_per_range = 32);

With the table created, we can create the daily partitions. The following command creates a single daily partition for the readings table:

CREATE TABLE readings_p2020_12_25 
  PARTITION OF public.readings
    FOR VALUES FROM ('2020-12-25 00:00:00+00') 
                 TO ('2020-12-26 00:00:00+00');

Although the command is simple, creating daily partitions going back more than a year can be very tedious. To simplify the creation of partitions, an extension, we can use the pg_partman extension. pg_partman provides functions to create and manage partitions for time series tables. The following commands use pg_partman to create daily partition for the readings table starting on January 1, 2020, through today. It also creates partitions 4 days into the future so they are available as time progresses. pg_partman also includes functions to create future partitions on an ongoing basis so partitions are always available for new data. These functions can be scheduled to run every day using the extension pg_cron.

CREATE EXTENSION pg_partman;

SELECT create_parent('public.readings', 'time', 'native', 
                     'daily', p_start_partition := '2020-01-01');

With the daily partitions in place for all of the time series tables, the ingestion process is run again. The final size of the database with partitions remains the same at 101 GB, but the load time is significantly improved. With partitioning, the ingestion finishes nearly 50 minutes faster, from 4,761 seconds to 1,774 seconds (-62.7% time reduction), with an ingestion rate of 1,653,024 metrics per second.

With partitioning, the ingestion finishes nearly 50 minutes faster

As we saw in the earlier graph, the details of ingestion can be consequential. The following graph shows the ingestion using a partitioned table compared to using a single large table. We can see that the partition line is significantly shorter because it completed much faster. We can also see that it’s stable around 1.6 million metrics per second over the duration of the load.

The following graph shows the ingestion using a partitioned table compared to using a single large table.

Summary

In this post, we covered a native PostgreSQL functionality that lends itself to time series data. With just simple planning on the data model design, a moderately sized Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL instance can easily ingest over 1 million metrics per second while still leaving resources available to query the data.

The maturity of native PostgreSQL partitioning is a game changer for handling large volumes of time series data. With Amazon RDS for PostgreSQL 13, PostgreSQL can handle thousands of partitions, meaning there can be daily or even hourly partitions on a table going back years. The extensions pg_partman and pg_cron make the management of those partitions simple and easy to add.


About the Authors

Jim Mlodgenski is a Principal Database Engineer at AWS. Prior to joining AWS, Jim was CTO at OpenSCG and CEO at StormDB, a Postgres-XC scalable public cloud solution. Jim was chief architect and part of the founding team of EnterpriseDB. Jim also very active in the PostgreSQL community, co-founding both the New York and Philadelphia PostgreSQL Meetup groups, co-organizing the largest PostgreSQL conferences, and serving on the Board of the PostgreSQL Foundation. Jim also speaks regularly at PostgreSQL and other open sources conferences.

 

Andy KatzAndy Katz is a Senior Product Manager at Amazon Web Services. Since 2016, Andy has helped AWS customers with relational databases, machine learning, and serverless workflows.