Scientists, developers, and many other technologists from many different industries are taking advantage of Amazon Web Services to perform big data analytics and meeting the challenges of the increasing volume, variety, and velocity of digital information. Amazon Web Services offers an comprehensive, end-to-end portfolio of cloud computing services to help you manage big data by reducing costs, scaling to meet demand, and increasing the speed of innovation.
See the AWS Big Data solutions for every stage of the big data lifecycle:
"AWS allows us to focus on what we do best while having access to amazing computational resources for processing big data problems."
Alex Dickinson, Senior VP of Cloud Genomics, Illumina
“Using Amazon Elastic MapReduce we were able to save $55,000 in upfront hardware costs and get up and running in a matter of days, not months.”
Jim Blomo, Engineering Manager - Data-Mining, Yelp
The AWS Big Data Blog is intended for solutions architects, data scientists and developers to learn big data best practices, discover which managed AWS dig data services are the best fit for their use case, and help get started and to go deep on AWS big data services. The goal of the blog is to make this the hub for anyone to discover new ways to collect, store, process, analyze, and visualize data at any scale. Readers will find short tutorials with code samples, case studies that demonstrate the unique benefits of working with big data on AWS, new feature announcements, partner and customer generated demos and tutorials, with tips and best practices for using AWS big data services.
Big data is being used to transform businesses, increase efficiency, and drive innovation. In the Big Data & HPC track, you will hear from experts in the data analytics, data warehouse, big data, and high performance computation fields share their AWS success stories. The sessions will provide best practices, architectural design patterns, and in-depth discussions of Hadoop, Amazon Elastic MapReduce, Amazon Redshift, Amazon Kinesis, AWS Data Pipeline, and Amazon S3.
It feels like everything generates data today, from your customers on social networks to the instances running your web applications. AWS makes it easy to provision the storage, computation, and database services you need to turn that data into information for your business. AWS also has data transfer services which can move big data into and out of the cloud quickly such as AWS Direct Connect and our Import/Export service. Furthermore, all inbound data traffic into AWS is free.
Amazon Kinesis is a managed service for real-time processing of streaming big data. Amazon Kinesis supports data throughput from megabytes to gigabytes of data per second and can scale seamlessly to handle streams from hundreds of thousands different sources. Designed to provide for high availability and durability in cost-effective manner, you can now focus on making sense of your data which will enable you to make better decisions faster and at lower costs.
Whether you’re storing pharmaceutical data for analysis, financial data for computation and pricing, or multimedia files such as photos and videos, Amazon Simple Storage Service (S3) is the ideal big data cloud storage solution to store original content durably. Designed for eleven 9's of durability, with no single point of failure, Amazon S3 is your fundamental big data object store.
Amazon Elastic Block Store (EBS) provides hard drives for as persistent storage for virtual machines. Amazon EBS volumes offer the consistent and low-latency performance needed to run big data workloads such as your own relational or NoSQL databases, enterprise applications, and high performance distributed network file systems.
NoSQL data stores benefit greatly from the speed of solid state drives (SSDs). Amazon DynamoDB uses them by default, but if you are using alternatives from the AWS Marketplace, such as Cassandra or MongoDB, you can accelerate your access with on-demand access to terabytes of solid state storage, with the High I/O instance class.
When you need a NoSQL database without the operational burden to run it, look no further than Amazon DynamoDB. It is a fast, fully-managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic.
Amazon DynamoDB has provisioned guaranteed throughput and single-digit millisecond latency make it a great fit for gaming, ad tech, mobile and many other big data applications.
Big data innovation goes beyond NoSQL, it is more about bringing the appropriate technology to use on your data depending on your business needs. Relational databases deliver fast, predictable, and consistent performance; and it is optimized for transactional workloads such as point of sales or financial history. Relational databases play a complementary role to NoSQL databases in many comprehensive big data architectures.
Amazon RDS allows you to easily to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.
Amazon Redshift provides a fast, fully-managed, petabyte-scale data warehouse for less than $1000 per terabyte per year. Amazon Redshift delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes. In just a few minutes, you can easily provision a fully managed data warehouse with automated backups and built-in encryption. Plug in easily with your existing business intelligence tools.
Amazon Elastic MapReduce (EMR) provides the powerful Apache Hadoop framework on Amazon EC2 as a easy-to-use managed service. With Amazon EMR, you can focus on your map/reduce queries and take advantage of the broad ecosystem of Hadoop tools, while deploying to a high-scale, secure infrastructure platform. Run big data analytics jobs in the cloud with ease; let Amazon EMR do the work of managing your Hadoop clusters.
How fast could your project go with another 1000 virtual machines? How about 10,000? The Amazon Spot Market, integrated into Amazon Elastic MapReduce, lets you choose your own price for the computing resources you need to do analytics with cloud computing. That means you can choose your own balance of cost and performance, overclocking your analytics when you need to, or reducing costs significantly.
Amazon Glacier allows you to offload the administrative burdens of operating and scaling archival storage to AWS, and makes retaining data for long periods, whether measured in years or decades, especially simple. Amazon Glacier is an extremely low-cost cold storage service starting at $0.01 per GB per month. There are no upfront capital commitments, and all ongoing operational expenses are included in the price.