Scientists, developers, and many other technologists from many different industries are taking advantage of Amazon Web Services to meet the challenges of the increasing volume, variety, and velocity of digital information. Amazon Web Services offers an end-to-end portfolio of cloud computing resources to help you manage big data by reducing costs, gaining a competitive advantage, and increasing the speed of innovation.
It feels like everything generates data today, from your customers on social networks to the instances running your web applications. AWS makes it easy to provision the storage, computation, and database services you need to turn that data into information for your business. AWS also has data transfer services which can move big data into and out of the cloud quickly such as AWS Direct Connect and our Import/Export service. Furthermore, all inbound data traffic into AWS is free.
Amazon Kinesis is a managed service for real-time processing of streaming big data. Amazon Kinesis supports data throughput from megabytes to gigabytes of data per second and can scale seamlessly to handle streams from hundreds of thousands different sources. Designed to provide for high availability and durability in cost-effective manner, you can now focus on making sense of your data which will enable you to make better decisions faster and at lower costs.
Whether you’re storing pharmaceutical data for analysis, financial data for computation and pricing, or multimedia files such as photos and videos, Amazon Simple Storage Service (S3) is the ideal location to store original content durably. Designed for eleven 9's of durability, with no single point of failure, Amazon S3 is your fundamental big data object store.
Amazon Elastic Compute Cloud (EC2) provides virtual machines as as service. Amazon Elastic Block Store (EBS) provides hard drives for as persistent storage for those virtual machines. Amazon EBS volumes offer the consistent and low-latency performance needed to run big data workloads such as your own relational or NoSQL databases, enterprise applications, and high performance distributed network file systems.
NoSQL data stores benefit greatly from the speed of solid state drives. DynamoDB uses them by default, but if you are using alternatives from the AWS Marketplace, such as Cassandra or MongoDB, you can accelerate your access with on-demand access to terabytes of solid state storage, with the High I/O instance class.
When you need a NoSQL database without the operational burden to run it, look no further than Amazon DynamoDB. It is a fast, fully-managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic.
Amazon DynamoDB has provisioned guaranteed throughput and single-digit millisecond latency make it a great fit for gaming, ad tech, mobile and many other big data applications.
Big data innovation goes beyond NoSQL, it is more about bringing the appropriate technology to use on your data depending on your business needs. Relational databases deliver fast, predictable, and consistent performance; and it is optimized for transactional workloads such as point of sales or financial history. Relational databases play a complementary role to NoSQL databases in any comprehensive big data architecture.
Amazon RDS allows you to easily easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.
Amazon Redshift provides a fast, fully-managed, petabyte-scale data warehouse for less than $1000 per terabyte per year. Amazon Redshift delivers fast query and I/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes. In just a few minutes, you can easily provision a fully managed data warehouse with automated backups and built-in encryption. Plug in easily with your existing business intelligence tools.
Amazon Elastic MapReduce (EMR) provides the powerful Hadoop framework on Amazon EC2 as a easy-to-use managed service. With Amazon EMR, you can focus on your map/reduce queries and take advantage of the broad ecosystem of Hadoop tools, while deploying to a high-scale, secure infrastructure platform.
How fast could your project go with another 1000 virtual machines? How about 10,000? The Amazon Spot Market, integrated into Amazon Elastic MapReduce, lets you choose your own price for the computing resources you need. That means you can choose your own balance of cost and performance, overclocking your analytics when you need to, or reducing costs significantly.
Amazon Glacier allows you to offload the administrative burdens of operating and scaling archival storage to AWS, and makes retaining data for long periods, whether measured in years or decades, especially simple. Amazon Glacier is an extremely low-cost cold storage service starting at $0.01 per GB per month. There are no upfront capital commitments, and all ongoing operational expenses are included in the price.
Many of the datasets you need are already available on the AWS Cloud, as part of the Amazon Public Datasets program. So whether you’re looking to mine the Common Crawl open web corpus, align some genomes, or explore images from NASA, AWS provides the data, the services, and the infrastructure you need to get up and running.
AWS Marketplace is an online store that provides an easy way for developers and IT Professionals to discover and use software to run in the AWS Cloud. Now you can find software in AWS Marketplace to collect, analyze, and collaborate when working with large amounts of data to support big data projects. AWS Marketplace has software to support the range of big data solutions including NoSQL, Hadoop, and Cassandra.