Get Started with the Project

7 Steps  |  60 Minutes

Q: What is data warehousing?

Analytics is ubiquitous. We all use reports and dashboards to manage our work, report our progress to stakeholders, and perform ad-hoc analytics to support decision making. Under the hoods, these reports, dashboards and BI tools are powered by data warehouses, which store data efficiently to minimize I/O and deliver query results at blazing speeds to hundreds and thousands of users concurrently. Unlike transactional databases, data warehouses use specialized architectures and storage for fast query and data load performance. Data warehouses also need to be highly scalable so that you can add more data sources all the time to enrich analytics and insights. Lastly, data warehouses should integrate seamlessly with 3rd party business intelligence tools and SQL clients, and support standard SQL so that customers can use skills they already have.

Q: Why should I run data warehousing on AWS?

Amazon Redshift, our data warehousing solution, is fast, easy-to-use, and fully managed. It automates infrastructure provisioning and administrative tasks such as backups, replication, and patching. It integrates seamlessly with 3rd party BI and ETL tools, so you can get to your first report in just a few minutes. And, there is no limit to the amount of data you can load and analyze. As your data grows, you don’t have to worry about expensive system upgrades or slow performance. Amazon Redshift is fast at any scale because it uses columnar storage and several optimization techniques. Amazon Redshift is also cost-effective and you only pay for what you use. Bottom line is, you can have unlimited number of users doing unlimited analytics on all your data for just $1000 per terabyte per year. 

Q: What is Amazon Redshift?

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Start small for $0.25 per hour with no commitments and scale to petabytes for $1,000 per terabyte per year, less than a tenth the cost of traditional solutions. Customers typically see 3x compression, reducing their costs to $333 per uncompressed terabyte per year.

Q: How does the performance of Amazon Redshift compare to most traditional databases for data warehousing and analytics?

Amazon Redshift uses a variety of innovations to achieve up to ten times higher performance than traditional databases for data warehousing and analytics workloads:

  • Columnar Data Storage: Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Unlike row-based systems, which are ideal for transaction processing, column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, greatly improving query performance.
  • Advanced Compression: Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores. In addition, Amazon Redshift doesn't require indexes or materialized views and so uses less space than traditional relational database systems. When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.
  • Massively Parallel Processing (MPP): Amazon Redshift automatically distributes data and query load across all nodes. Amazon Redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.

Q: How do I access my running data warehouse cluster?

Once your data warehouse cluster is available, you can retrieve its endpoint and JDBC and ODBC connection string from the AWS Management Console or by using the Redshift APIs. You can then use this connection string with your favorite database tool, programming language, or Business Intelligence (BI) tool. You will need to authorize network requests to your running data warehouse cluster. For a detailed explanation please refer to our Getting Started Guide.

Q: Is Amazon Redshift compatible with my preferred business intelligence software package and ETL tools?

Amazon Redshift uses industry-standard SQL and is accessed using standard JDBC and ODBC drivers. You can download Amazon Redshift custom JDBC and ODBC drivers from the Connect Client tab of our Console. We have validated integrations with popular BI and ETL vendors, a number of which are offering free trials to help you get started loading and analyzing your data. You can also go to the AWS Marketplace to deploy and configure solutions designed to work with Amazon Redshift in minutes.

Q: How do I get started with Amazon Redshift?

You can try Amazon Redshift for free. If you’ve never created an Amazon Redshift cluster, you’re eligible for a 2-month free trial of our DC1.Large node. You get 750 hours per month for free, enough hours to continuously run one DC1.Large node with 160GB of compressed SSD storage. You can also build clusters with multiple nodes to test larger data sets, which will consume your free hours more quickly. Once your two month free trial expires or your usage exceeds 750 hours per month, you can shut down your cluster, avoiding any charges, or keep it running at our standard On-Demand Rate.

Get Started with the Project