While a relational database is optimized for storing rows of data, typically for transactional applications, a columnar database is optimized for fast retrieval of columns of data, typically in analytical applications. Column-oriented storage for database tables is an important factor in analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk.

Like other NoSQL databases, column-oriented databases are designed to scale “out” using distributed clusters of low-cost hardware to increase throughput, making them ideal for data warehousing and Big Data processing.

Get Started with AWS for Free

Create a Free Account

AWS Free Tier offers 25 GB of storage, up to 200 million requests per month with Amazon DynamoDB.

View AWS Free Tier Details »

Amazon Web Services (AWS) provides a variety of columnar database options for developers. You can operate your own non-relational columnar data store in the cloud on Amazon EC2 and Amazon EBS, work with AWS solution providers, or take advantage of fully managed columnar database services.

Amazon Redshift is a column-oriented, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Amazon Redshift achieves efficient storage and optimum query performance through a combination of massively parallel processing, columnar data storage, and very efficient, targeted data compression encoding schemes. Learn more about Amazon Redshift »

Developers may install column-oriented databases of their choice on Amazon EC2 and Amazon EMR, which means developers avoid the friction of infrastructure provisioning while gaining access to a variety of standard columnar database engines.

Cassandra is an open source, column-oriented database designed to handle large amounts of data across many commodity servers. Unlike a table in a relational database, different rows in the same table (column family) do not have to share the same set of columns.

See a multi-region Cassandra configuration with a look inside Vidora’s globally distributed, low-latency A.I.

Consider EBS when running Cassandra workloads (learn how CrowdStrike ran dense, cheaper Cassandra clusters with EBS). For more about working with Cassandra and running Cassandra on AWS, read the Apache Cassandra on AWS whitepaper and visit the AWS Marketplace » 

Apache HBase is an open-source, column-oriented, distributed NoSQL database. HBase runs on the Apache Hadoop framework. HBase provides you a fault-tolerant, efficient way of storing large quantities of sparse data using column-based compression and storage.

You can deploy HBase on Amazon Elastic Cloud Compute (Amazon EC2) and manage it yourself or leverage Apache HBase as a managed service on Amazon Elastic MapReduce (Amazon EMR).  Learn more by reading the EMR Developer Guide and this post on the AWS Big Data Blog »