With Amazon Redshift, you can start small at $0.25 per hour and scale up to petabytes of data and thousands of concurrent users. Choose what is right for your business needs, with the ability to grow storage without over-provisioning compute or storage. With provisioned Amazon Redshift, you can choose On-Demand Instances and pay for your database by the hour with no long-term commitments or upfront fees, or choose Reserved Instances for additional savings. Alternatively, Amazon Redshift Serverless allows you to pay for usage by automatically starting up, shutting down, and scaling capacity up or down based on your application's needs, so you pay only for capacity consumed while processing the workload.
What to expect with provisioned Amazon Redshift:
First, learn more about node types so you can choose the best cluster configuration for your needs. You can quickly scale your cluster, pause and resume the cluster, and switch between node types with a single API call or a few clicks in the Redshift console. You’ll see on-demand pricing before making your selection, and later you can purchase reserved nodes for significant discounts.
Once you make your selection, you may wish to use Elastic Resize to easily adjust the amount of provisioned compute capacity within minutes for steady-state processing. With Resize Scheduler, you can add and remove nodes on a daily or weekly basis to optimize cost and get the best performance. For dynamic workloads, you can use Concurrency Scaling to automatically provision additional compute capacity and pay only for what you use on a per-second basis after exhausting the free credits (see Concurrency Scaling pricing).
Amazon Redshift node types
RA3 nodes with managed storage allow you to optimize your data warehouse by scaling and paying for compute and managed storage independently. With RA3, you choose the number of nodes based on your performance requirements and pay only for the managed storage you use. You should size your RA3 cluster based on the amount of data you process daily.
Redshift Managed Storage (RMS) uses large, high-performance solid-state drives (SSDs) in each RA3 node for fast local storage and Amazon Simple Storage Service (Amazon S3) for longer-term durable storage. If the data in a node grows beyond the size of the large local SSDs, RMS automatically offloads that data to Amazon S3. You pay the same low rate for RMS regardless of whether the data resides in high-performance SSDs or in Amazon S3. For workloads requiring ever-growing storage, managed storage lets you automatically scale your data warehouse storage capacity without adding and paying for additional nodes.
DC2 nodes enable compute-intensive data warehouses with local SSD storage included. Choose the number of nodes you need based on data size and performance requirements. DC2 nodes store your data locally for high performance, and as the data size grows, you can add more compute nodes to increase the storage capacity of the cluster. For datasets under 1 TB uncompressed, we recommend DC2 node types for the best performance at the lowest price. If you expect your data to grow, we recommend using RA3 nodes so you can size compute and storage independently to achieve the best price and performance.
Redshift capabilities with pay-as-you-go pricing
- Amazon Redshift node types: Choose the best cluster configuration and node type for your needs, and can pay for capacity by the hour with Amazon Redshift on-demand pricing. When you choose on-demand pricing, you can use the pause and resume feature to suspend on-demand billing when a cluster is not in use. You can also choose Reserved Instances instead of on-demand instances for steady-state workloads and get significant discounts over on-demand pricing.
- Amazon Redshift Spectrum pricing: Run SQL queries directly against the data in your Amazon S3 data lake, out to exabytes—you simply pay for the number of bytes scanned.
- Concurrency Scaling pricing: Each cluster earns up to one hour of free Concurrency Scaling credits per day, which is sufficient for 97% of customers. This enables you to provide consistently fast performance, even with thousands of concurrent queries and users. You simply pay a per-second on-demand rate for usage that exceeds the free credits.
- RMS pricing: Pay only for the data you store in RA3 clusters, independent of the number of compute nodes provisioned. You simply pay hourly for the total amount of data in managed storage. RMS is also used with Amazon Redshift Serverless.
- Redshift ML: Use SQL to create, train, and deploy machine learning (ML) models. After you exhaust the free tier for Amazon SageMaker, you will incur costs for creating your model and storage. Redshift ML is also available for use with Amazon Redshift Serverless.
Amazon Redshift Free Trial
If you have never used Amazon Redshift Serverless before, you are eligible for a $300 credit with a 90-day expiration toward your compute and storage use. The consumption rate of this credit is dependent on actual usage and the compute capacity of your serverless endpoint.
In regions where Amazon Redshift Serverless is not yet available, customers can start a free trial for provisioned clusters. You’re eligible for a two-month free trial of our DC2 large node. Your organization gets 750 hours per month for free, enough to continuously run one DC2 large node with 160 GB of compressed SSD storage. Once your two month free trial expires or your usage exceeds 750 hours per month, you can shut down your cluster to avoid any charges, or keep it running at our standard on-demand Rate. Please visit the Amazon Redshift free trial page to learn more.
Amazon Redshift on-demand pricing allows you to pay for provisioned capacity by the hour with no commitments and no upfront costs for the specific node type you choose to run your data warehouse on. Simply pay an hourly rate based on the chosen type and number of nodes in your cluster and you will be billed as long as the cluster is running. Partial hours are billed in one-second increments following a billable status change such as creating, deleting, pausing, or resuming the cluster. The pause and resume feature allows you to suspend on-demand billing during the time the cluster is paused. Pause and Resume is a manual or scheduled operation on Redshift node types. During the time that a cluster is paused you pay only for backup storage. This frees you from planning and purchasing data warehouse capacity ahead of your needs, and enables you to cost-effectively manage environments for development or test purposes. For a Multi-AZ deployment, you would pay the same billing rates but for double the compute as you would pay for a single-AZ deployment.
*Total addressable storage capacity in the managed storage with each RA3 node.
Calculating your effective on-demand price per TB per year
For on-demand, the effective price per TB per year is the hourly price for the instance, times the number of hours in a year, divided by the number of TB per instance. For RA3, data stored in managed storage is billed separately based on actual data stored in the RA3 node types; effective price per TB per year is calculated for only the compute node costs.
Amazon Redshift Serverless
You can start using Amazon Redshift Serverless for as low as $3 per hour and pay only for the compute capacity your data warehouse consumes when it is active. Your data warehouse capacity automatically scales up or down to meet your analytics workload demands and shuts down during periods of inactivity to save administration time and costs. Amazon Redshift measures data warehouse capacity in Redshift Processing Units (RPUs). You pay for the workloads you run in RPU-hours on a per-second basis (with a 60-second minimum charge), including queries that access data in open file formats in Amazon S3. There is no charge for data warehouse start up time. Automatic scaling and comprehensive security capabilities are included. You do not need to pay for concurrency scaling and Redshift Spectrum separately because they are both included with Amazon Redshift Serverless.
You can optionally use Base, Max RPU-Hours and MaxRPU (max capacity) settings to control data warehouse performance and costs.
- Base – This setting allows you to specify the base data warehouse capacity Amazon Redshift uses to serve queries. Base capacity is specified in RPUs. Setting higher base compute capacity can improve the query performance especially for data processing and ETL (extract, transform, load) jobs that process large amounts of data and perform transformations and enrichment. You can adjust the Base from 8 RPUs to 512 RPUs in units of 8 (8, 16, 24, 32, 40, 48, and so on, up to 512) from the Amazon Redshift management console or by invoking an Amazon Redshift API.
- Max – This setting allows you to specify usage limits, and define actions that Amazon Redshift automatically takes if those limits are reached to maintain your budget with predictability. Max is specified in RPU-hours and associated with a daily, weekly, or monthly duration. Setting higher max compute capacity can improve the overall throughput of the system, which is especially beneficial for workloads that need to handle high concurrency while maintaining consistently high performance. You can adjust the Max from the Amazon Redshift management console or by invoking an Amazon Redshift API.
- MaxRPU (Max Capacity) - This setting establishes the highest count of RPUs that Amazon Redshift Serverless can accommodate for scaling purposes. When automatic compute scaling is required, having a higher value for MaxRPU can enhance query throughput. When the MaxRPU limit is reached, the workgroup compute doesn't scale up resources any further.
Primary storage capacity is billed as Redshift Managed Storage (RMS) and storage used for user snapshots is billed at the standard backup billing rates outlined on this page. Storage is billed at same rates as with Amazon Redshift provisioned clusters. With Amazon Redshift Serverless you can restore your data warehouse to specific points in the last 24 hours at a 30 min granularity free of charge. Data transfer costs and ML costs apply separately, the same as provisioned clusters. Snapshot replication and data sharing across AWS Regions are billed at the transfer rates outlined on this page.
Amazon Redshift managed storage pricing
You pay for data stored in managed storage at a fixed GB-month rate for your region. Managed storage comes exclusively with RA3 node types, and you pay the same low rate for Redshift managed storage regardless of data size. Usage of managed storage is calculated hourly based on the total data present in the managed storage (see example below converting usage in GB-Hours to charges in GB- Month). You can monitor the amount of data in your RA3 cluster via Amazon CloudWatch or the AWS Management Console. You do not pay for any data transfer charges between RA3 nodes and managed storage. Managed storage charges do not include back up storage charges due to automated and manual snapshots (see Backup Storage). Once the cluster is terminated, you continue to be charged for the retention of your manual backups.
Pricing example for managed storage pricing
Let's convert this to GB-months: 36,900,000 GB-hours / 720 hours per month in April = 51,250 GB-months.
Amazon Redshift Spectrum pricing
Amazon Redshift Spectrum allows you to directly run SQL queries against exabytes of data in Amazon S3. You are charged for the number of bytes scanned by Redshift Spectrum, rounded up to the next megabyte, with a 10 MB minimum per query. There are no charges for Data Definition Language (DDL) statements such as CREATE/ALTER/DROP TABLE for managing partitions and failed queries.
Amazon Redshift Serverless queries of external data in Amazon S3 are not billed for separately and are included in the amount billed for Amazon Redshift Serverless in RPU-hr amounts.
You can improve query performance and reduce costs by storing data in a compressed, partitioned, and columnar data format. If you compress data using one of Redshift Spectrum’s supported formats, your costs will decrease because less data is scanned. Similarly, if you store data in a columnar format, such as Apache Parquet or Optimized Row Columnar (ORC), your charges will decrease because Redshift Spectrum only scans columns required by the query.
With Redshift Spectrum, you are billed per terabyte of data scanned, rounded up to the next megabyte, with a 10 MB minimum per query. For example, if you scan 10 GB of data, you will be charged $0.05. If you scan 1 TB of data, you will be charged $5.00.
You are charged for the Amazon Redshift cluster used to query data with Redshift Spectrum. Redshift Spectrum queries data directly in Amazon S3. You are charged standard S3 rates for storing objects in your S3 buckets, and for requests made against your S3 buckets. For details, refer to Amazon S3 rates.
If you use the AWS Glue Data Catalog with Amazon Redshift Spectrum, you are charged standard AWS Glue Data Catalog rates. For details, refer to AWS Glue pricing.
When using Amazon Redshift Spectrum to query AWS Key Management Service (KMS) encrypted data in Amazon S3, you are charged standard AWS KMS rates. For details, refer to AWS KMS pricing.
Redshift Spectrum pricing examples based on US East (N. Virginia) pricing
Consider a table with 100 equally sized columns stored in Amazon S3 as an uncompressed text file with a total size of 4 TB. Running a query to get data from a single column of the table requires Redshift Spectrum to scan the entire file, because text formats cannot be split. This query would scan 4 TB and cost $20. ($5/TB x 4 TB = $20)
If you compress your file using GZIP, you may see a 4:1 compression ratio. In this case, you would have a compressed file size of 1 TB. Redshift Spectrum has to scan the entire file, but since it is one-fourth the size, you pay one-fourth the cost, or $5. ($5/TB x 1 TB = $5)
If you compress your file and convert it to a columnar format like Apache Parquet, you may see a 4:1 compression ratio and have a compressed file size of 1 TB. Using the same query as above, Redshift Spectrum needs to scan only one column in the Parquet file. The cost of this query would be $0.05. ($5/TB x 1 TB file size x 1/100 columns, or a total of 10 GB scanned = $0.05)
Note: The above pricing examples are for illustration purposes only. The compression ratio of different files and columns may vary.
Concurrency Scaling pricing
Amazon Redshift automatically adds transient capacity to provide consistently fast performance, even with thousands of concurrent users and queries. There are no resources to manage, no upfront costs, and you are not charged for the startup or shutdown time of the transient clusters. You can accumulate one hour of Concurrency Scaling cluster credits every 24 hours while your main cluster is running. You are charged the per-second on-demand rate for a Concurrency Scaling cluster used in excess of the free credits—only when it's serving your queries—with a one-minute minimum charge each time a Concurrency Scaling cluster is activated. The per-second on-demand rate is based on the type and number of nodes in your Amazon Redshift cluster.
Amazon Redshift Serverless automatically scales resources up and down as needed to meet workload needs by default and there are no separate charges for Concurrency Scaling.
Concurrency Scaling credits
Redshift clusters earn up to one hour of free Concurrency Scaling credits per day. Credits are earned on an hourly basis for each active cluster in your AWS account, and can be consumed by the same cluster only after credits are earned. You can accumulate up to 30 hours of free Concurrency Scaling credits for each active cluster. Credits do not expire as long as your cluster is not terminated.
Pricing example for Concurrency Scaling
A 10 DC2.8XL node Redshift cluster in the US-East costs $48 per hour. Consider a scenario where two transient clusters are utilized for five minutes beyond the free Concurrency Scaling credits. The per-second on-demand rate for Concurrency Scaling is $48 x 1/3600 = $0.013 per second. The additional cost for Concurrency Scaling in this case is $0.013 per second x 300 seconds x 2 transient clusters = $8. Therefore, the total cost of the Amazon Redshift cluster and the two transient clusters in this case is $56.
Redshift ML pricing
When you get started with Redshift ML, you qualify for the Amazon SageMaker free tier if you haven’t previously used Amazon SageMaker. This includes two free CREATE MODEL requests per month for two months with up to 100,000 cells per request. Your free tier starts from the first month when you create your first model in Redshift ML.
Amazon S3 charges
The CREATE MODEL request also incurs small Amazon S3 charges. S3 costs should be less than $1 per month since the amount of S3 data generated by CREATE MODEL is in the order of a few gigabytes. When garbage collection is on, they are quickly removed. Amazon S3 is used first to store the training data produced by the SELECT query of the CREATE MODEL. Then it is used to store various model-related artifacts needed for prediction. The default garbage collection mode will remove both training data and model-related artifacts at the end of CREATE MODEL.
Cost control options
You can control the training cost by setting the MAX_CELLS. If you do not, the default value of MAX_CELLS is 1 million, which in the vast majority of cases will keep your training cost below $20. When the training data set is above 1 million, the pricing increases as follows:
|Number of cells
First 10M cells
$20 per million cells
Next 90M cells
$15 per million cells
Over 100M cells
$7 per million cells
Note: Real pricing will often be less than the upper bounds shared above.
Examples of CREATE MODEL cost:
- 100,000 cells is $20 (= 1 x 20)
- 2,000,000 cells is $40 (= 2 x 20)
- 23,000,000 cells is $395 (= 10 x 20 + 13 x 15)
- 99,000,000 cells is $1,535 (= 10 x 20 + 89 x 15) and
- 211,000,000 cells is $2,327 (= 10 x 20 + 90 x 15 + 111 x 7)
If the training data produced by the SELECT query of the CREATE MODEL request exceeds the MAX_CELLS limit you provided (or the default 1 million, if you did not provide one), the CREATE MODEL will randomly choose approximately MAX_CELLS/“number of columns” records from the training dataset and will train using these randomly chosen tuples. The random choice is designed to prevent bias in the reduced training dataset. Thus, by setting the MAX_CELLS, you can keep your cost within bounds.
Reserved Instance pricing
Reserved Instances are appropriate for steady-state production workloads, and offer significant discounts over on-demand pricing of Amazon Redshift node types. Customers typically purchase Reserved Instances after running experiments and proof-of-concepts to validate production configurations.
You can benefit from significant savings over on-demand rates by committing to use Amazon Redshift for a one- or three-year term. Reserved Instance pricing is specific to the node type purchased, and remains in effect until the reservation term ends. Prices include two additional copies of data - one on the cluster nodes and one in Amazon S3. We take care of backup, durability, availability, security, monitoring, and maintenance for you.
There are three options for Reserved Instance pricing:
No Upfront – You pay nothing upfront, and you commit to pay monthly over the course of one year.
Partial Upfront – You pay a portion of the Reserved Instance upfront, and the remainder over a one- or three-year term.
All Upfront – You pay for the entire Reserved Instance term (one or three years) with one upfront payment.
Reserved Instances are a billing concept and are not used to create data warehouse clusters. When you make a purchase, you will be charged the associated upfront and monthly fees even if you are not currently running a cluster, or if an existing cluster is paused. To purchase Reserved Instances, visit the Reserved Nodes tab in the Redshift console.
We may terminate the Reserved Instance pricing program at any time. In addition to being subject to Reserved Instance pricing, Reserved Instances are subject to all data transfer and other fees applicable under the AWS Customer Agreement or other agreement with us governing your use of our services.
*This is the average monthly payment over the course of the Reserved Instance term. For each month, the actual monthly payment will equal the actual number of hours in that month multiplied by the hourly usage rate or number of seconds in that month multiplied by the hourly usage rate divided by 3600, depending on the Redshift instance type you run. The hourly usage rate is equivalent to the total average monthly payments over the term of the Reserved Instance divided by the total number of hours (based on a 365 day year) over the term of the Reserved Instance.
** Effective hourly pricing helps you calculate the amount of money a Reserved Instance will save you over On-Demand pricing. When you purchase a Reserved Instance, you are billed for every hour during the entire Reserved Instance term you select, regardless of whether the instance is running. The effective hourly price shows the amortized hourly instance cost. This takes the total cost of the Reserved Instance over the entire term, including any upfront payment, and spreads it out over each hour of the Reserved Instance term.
*** For Reserved Instances, add the upfront payment to the hourly rate multiplied by the number of hours in the term, and divide by the number of years in the term and number of TB per node. For RA3, data stored in managed storage is billed separately based on actual data stored in the RA3 node types; effective price per TB per year is calculated only for the compute node costs.
Zero-ETL integration costs NEW
Amazon Aurora zero-ETL integration with Amazon Redshift enables near real-time analytics and ML using Amazon Redshift on petabytes of transactional data from Aurora. Zero-ETL integration removes the need to build and maintain complex data pipelines that perform extract, transform, and load (ETL) operations.
AWS does not charge an additional fee for the zero-ETL integration. You pay for existing Aurora and Amazon Redshift resources used to create and process the change data generated as part of a zero-ETL integration. These resources could include additional I/O and storage used by enabling enhanced binlog, Snapshot export costs for the initial data export to seed your Amazon Redshift databases, additional Amazon Redshift storage for storing replicated data, and cross-AZ data transfer costs for moving data from source to target. Ongoing processing of data changes is offered at no additional charge.
Backup storage is the storage associated with the snapshots taken for your data warehouse. Increasing your backup retention period or taking additional snapshots increases the backup storage consumed by your data warehouse. Amazon Redshift charges for manual snapshots you take using the console, application programming interface (API), or command-line interface (CLI). Redshift Automated snapshots, which are created using Amazon Redshift's snapshot scheduling feature, are offered at no charge and can be retained for a maximum of 35 days. You are not charged for Amazon Redshift Serverless recovery points that are less than 24 hours old. If you choose to keep recovery points beyond 24 hours they will incur charges as part of RMS. Data stored on RA3 clusters is part of RMS and is billed at RMS rates, but manual snapshots taken for RA3 clusters are billed as backup storage at standard Amazon S3 rates outlined on this page.
For example, if your RA3 cluster has 10 TB of data and 30 TB of manual snapshots, you would be billed for 10 TB of RMS and 30 TB of backup storage. With dense compute (DC) and dense storage (DS) clusters, storage is included on the cluster and is not billed separately, but backups are stored externally in Amazon S3. Backup storage beyond the provisioned storage size on DC and DS clusters is billed as backup storage at standard S3 rates. Snapshots are billed until they expire or are deleted, including when the cluster is paused or deleted.
There is no charge for data transferred between Amazon Redshift and Amazon S3 within the same AWS Region for backup, restore, load, and unload operations. For all other data transfers into and out of Amazon Redshift, you will be billed at standard AWS data transfer rates. In particular, if you run your Amazon Redshift cluster in Amazon Virtual Private Cloud (VPC), you will see standard AWS data transfer charges for data transfers over JDBC/ODBC to your Amazon Redshift cluster endpoint. In addition, when you use Enhanced VPC Routing and unload data to Amazon S3 in a different region, you will incur standard AWS data transfer charges. For more information about AWS data transfer rates, see the Amazon Elastic Cloud Compute (Amazon EC2) pricing page.
Amazon Redshift charges for data sharing across regions as well as for snapshot copy across regions. Data sharing charges are billed in the consumer region where the data is being accessed. Snapshot copy across regions is billed in the source region where the cluster that created the snapshot exists.
Data sharing data transfer IN From or Snapshot Copy Data Transfer OUT From
You use four ra3.xlarge nodes and 40 TB of RMS for a month. During the month, you also scan 20 TB of data using Redshift Spectrum and scan 20 TB of data. You use on-demand pricing.
Your charges would be calculated as follows:
- Redshift RA3 instance cost = 4 instances x $3.26 USD per hour x 730 hours in a month= $9,519.20 USD
- RMS cost = 40 TB x 1,024 GB per TB x $0.024 USD = $983.04 USD
- Redshift Spectrum cost = 20 TB x $5.00 USD = $100.00 USD
Total monthly cost: $10,602.24 USD
You use a Multi-AZ cluster that is deployed in two AZs simultaneously. Your cluster has four ra3.4xlarge nodes per AZ and you use 40 TB of RMS for a month. You use on-demand pricing.
Your charges would be calculated as follows:
- Redshift RA3 instance cost for AZ1 = 4 instances x $3.26 USD per hour x 730 hours in a month= $9,519.20 USD
- Redshift RA3 instance cost for AZ2 = 4 instances x $3.26 USD per hour x 730 hours in a month= $9,519.20 USD
- RMS cost = 40 TB x 1,024 GB per TB x $0.024 USD = $983.04 USD
Total monthly cost: $20,021.44 USD
Assume that you have a data processing job that needs to be executed run every hour from 7 AM to 7 PM on your Amazon Redshift data warehouse in the US East (N. Virginia) Region. For simplicity, assume that each time the job runs it takes the same amount of time – 10 minutes and 30 seconds. Let’s say that Amazon Redshift uses 128 RPUs capacity to run the job.
The following table summarizes your total usage for the day.
The job ran 13 times between 7 AM and 7 PM, each time taking 10 minutes and 30 seconds = 136 minutes and 30 seconds = 8190 seconds
$109.20 ((8190 x 128 RPU x $0.375 per RPU-hour) /3600)
Assume that you have a dash boarding application on your Amazon Redshift data warehouse in the US East (N. Virginia) region. The application is used by a variety of users in the organization (such as data analysts, developers, and data scientists) and has peak and down periods in the day. Specifically, the application has a spike in user activity in the morning from 9 AM to 11 AM and from 2 PM to 4 PM when most of the users are performing analytics and accessing data from the data warehouse. Let’s say that the application has four 15-minute intervals from 11 AM to 2 PM when there is no user activity. There is also no user activity between 10 PM and 5 AM.
Now let us look at the resource usage on the Amazon Redshift data warehouse. Assume that to have better control on price performance, you have explicitly set the Base configuration of Amazon Redshift Serverless as 64 RPU. Assume that during the peak periods in the morning and afternoon, Amazon Redshift automatically scales and uses a total of 192 RPU and 128 RPU capacity respectively.
The following table summarizes your total usage for the day.
Total query execution period
5 am – 9 am
64 RPU for 4 hours = 64 x 4= 256 RPU-hours
9 am – 11 am
192 RPU for 2 hours = 384 RPU-hours
11 am – 2 pm
Excluding the four 15 min intervals of idleness, the activity time is 2 hours.
64 RPU for 2 hours = 64 x 2= 128 RPU-hours
2 pm – 3 pm
128 RPU for 1 hour = 128 RPU-hours
3 pm – 10 pm
64 RPU for 7 hours = 64 x 7= 448 RPU-hours
10 pm – 5 am
$504 ((256+384+128+128+448) x $0.375 per RPU-hour)
A 10 DC2.8XL node Redshift cluster in the US-East costs $48 per hour. Consider a scenario where two transient clusters are used for five minutes beyond the free Concurrency Scaling credits. The per- second on-demand rate for Concurrency Scaling is $48 x 1/3600 = $0.013 per second. The additional cost for Concurrency Scaling in this case is $0.013 per second x 300 seconds x 2 transient clusters = $8. Therefore, the total cost of the Redshift cluster and the two transient clusters is $56.
Consider a table with 100 equally sized columns stored in Amazon S3 as an uncompressed text file with a total size of 4 TB. Running a query to get data from a single column of the table requires Redshift Spectrum to scan the entire file, because text formats cannot be split. Based on Redshift Spectrum pricing for US East (N. Virginia), this query would scan 4 TB and cost $20. ($5.00/ TB x 4 TB = $20)
If you compress your file using GZIP, you may see a 4:1 compression ratio. In this case, you would have a compressed file size of 1 TB. Redshift Spectrum has to scan the entire file, but since it is one- fourth the size, you pay one-fourth the cost, or $5. ($5/ TB x 1 TB = $5)
If you compress your file and convert it to a columnar format such as Apache Parquet, you may see a 4:1 compression ratio and have a compressed file size of 1 TB. Using the same query as above, Redshift Spectrum needs to scan only one column in the Parquet file. The cost of this query would be $0.05. ($5/TB x 1 TB file size x 1/100 columns, or a total of 10 GB scanned = $0.05).
Note: The above pricing examples are for illustration purposes only. The compression ratio of different files and columns may vary.