Why does a table in an Amazon Redshift provisioned cluster consume more or less disk storage space than expected?

4 minute read
0

A table is occupying more disk space than expected or a percentage of free disk space did not increase after my classic resize. I want to know how Amazon Redshift calculates disk storage and table size.

Resolution

Check the minimum table size

The minimum table size is the smallest footprint that a table has on an Amazon Redshift cluster. You can check the minimal table size when you analyze the cluster storage use or when you resize an Amazon Redshift cluster.

For tables that are created using the KEY, EVEN, or Auto (EVEN) distribution style, use the following formula:

Minimum table size = block_size (1 MB) *(number_of_user_columns + 3 system columns) * number_of_populated_slices * number_of_table_segments

For tables created using the ALL or Auto (ALL) distribution style, use the following formula:

Minimum table size = block_size (1 MB) *(number_of_user_columns + 3 system columns) * number_of_cluster_nodes * number_of_table_segments

For the table formulas, the number of segments is based on whether the table has a defined sort key. If an Amazon Redshift table has a defined sort key, then the table has two segments: one sorted segment and one unsorted segment. If an Amazon Redshift table has no sort key, then the table produces only one unsorted segment.

To calculate the number of populated slices, run the following query:

select count(distinct a.slice) asnumber_of_populated_slices, b."table" from stv_blocklist a,
svv_table_info b where a.tbl = b.table_id group by b."table" ;

Example: six dc2.large cluster

For example, you can create a six dc2.large cluster with four small tables of the same structure and number of rows. If the cluster uses three different distribution styles and one distribution style with a sort key, then different queries are used.

The following query creates a table with an ALL distribution style and an output of six populated slices:

create table testsize_all (a varchar(100),b varchar(100), cvarchar(100)) diststyle all;
insert into testsize_all values
('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c');

The following query creates a table with an EVEN distribution style and an output of six populated slices:

create table testsize_even (a varchar(100),b varchar(100), cvarchar(100)) diststyle even;
insert into testsize_even values
('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c');

The following query creates a table with a distribution key and an output of one populated slice:

create table testsize_key (a varchar(100),b varchar(100), cvarchar(100)) distkey (a);
insert into testsize_key values
('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c');

The following query creates a table with a distribution key and sort key and an output of six populated slices:

create table testsize_sort_even (a varchar(100),bvarchar(100), c varchar(100) ) diststyle even sortkey (a);
insert into testsize_sort_even values
('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c');

Calculate the minimum table size

To calculate the minimum table size for an EVEN distribution style, use the following formula:

Minimum table size = block_size (1 MB) *(number_of_user_columns + 3 system columns) * number_of_populated_slices *
number_of_table_segments
1MB * (3+3) * 6 *1 = 36MB

To calculate the minimum table size for an ALL distribution style, use the following formula:

Minimum table size = block_size (1 MB) *(number_of_user_columns + 3 system columns) * number_of_cluster_nodes *
number_of_table_segments
1MB * (3+3) * 6 *1 = 36 MB

To calculate the minimum table size for a KEY distribution style, use the following formula:

Minimum table size = block_size (1 MB) * (number_of_user_columns+ 3 system columns) * number_of_populated_slices * 
number_of_table_segments
1MB * (3+3) * 1 *1 = 6MB

To calculate the minimum table size for an even distribution with a sort key, use the following formula:

Minimum table size = block_size (1 MB) *(number_of_user_columns + 3 system columns) * number_of_populated_slices *
number_of_table_segments
1MB * (3+3) * 6 *2 = 72MB

With a small number of rows inserted, the size of your table is larger than expected. The size of the table continues to grow as the number of rows are inserted and the number of populated slices grow.

When you perform a classic resize, the number of populated slices grows without a growth in data volume for the table. As a result, the amount of free space after the resize doesn't linearly grow.

Related information

Amazon Redshift best practices for designing tables

Working with automatic table optimization

Columnar storage

Choose the best sort key

Query planning and execution workflow

AWS OFFICIAL
AWS OFFICIALUpdated 4 months ago