AWS Big Data Blog

Scale Your Amazon Kinesis Stream Capacity with UpdateShardCount

Allan MacInnis is a Kinesis Solution Architect for Amazon Web Services Starting today, you can easily scale your Amazon Kinesis streams to respond in real time to changes in your streaming data needs. Customers use Amazon Kinesis to capture, store, and analyze terabytes of data per hour from clickstreams, financial transactions, social media feeds, and […]

Read More

re:Invent 2016: AWS Big Data & Machine Learning Sessions

by Roy Ben-Alta | on | Permalink | Comments |  Share

Roy Ben-Alta is Sr. Business Development Manager at AWS – Big Data & Machine Learning Updated December 9, 2016 with links to session videos. We can’t believe that there are just a couple of weeks left before re:Invent 2016. If you are attending this year, you will want to check out our Big Data sessions! […]

Read More

Use Apache Flink on Amazon EMR

Craig Foster is a Big Data Engineer with Amazon EMR Apache Flink is a parallel data processing engine that customers are using to build real time, big data applications. Flink enables you to perform transformations on many different data sources, such as Amazon Kinesis Streams or the Apache Cassandra database.  It provides both batch and […]

Read More

Month in Review: October 2016

by Derek Young | on | Permalink | Comments |  Share

Another month of big data solutions on the Big Data Blog. Take a look at our summaries below and learn, comment, and share. Thanks for reading! Building Event-Driven Batch Analytics on AWS Modern businesses typically collect data from internal and external sources at various frequencies throughout the day. In this post, you learn an elastic […]

Read More

Using pgpool and Amazon ElastiCache for Query Caching with Amazon Redshift

Felipe Garcia and Hugo Rozestraten are Solutions Architects for Amazon Web Services In this blog post, we’ll use a real customer scenario to show you how to create a caching layer in front of Amazon Redshift using pgpool and Amazon ElastiCache. Almost every application, no matter how simple, uses some kind of database. With SQL […]

Read More

Fact or Fiction: Google BigQuery Outperforms Amazon Redshift as an Enterprise Data Warehouse?

Randall Hunt is a Technical Evangelist for Amazon Web Services A few weeks ago, 2nd Watch (a leading cloud native Systems Integrator) wrote the Benchmarking Amazon Aurora post, analyzing Google’s benchmark of their Cloud SQL database service against AWS’s Amazon Aurora. In that analysis, 2nd Watch found that Aurora outperforms Cloud SQL consistently and that […]

Read More

Running sparklyr – RStudio’s R Interface to Spark on Amazon EMR

Tom Zeng is a Solutions Architect for Amazon EMR The recently released sparklyr package by RStudio has made processing big data in R a lot easier. sparklyr is an R interface to Spark, it allows using Spark as the backend for dplyr – one of the most popular data manipulation packages. sparklyr also allows user […]

Read More

Optimizing Amazon S3 for High Concurrency in Distributed Workloads

Aaron Friedman is a Healthcare and Life Sciences Solution Architect with Amazon Web Services The healthcare and life sciences landscape is being transformed rapidly by big data. By intersecting petabytes of genomic data with clinical information, AWS customers and partners are already changing healthcare as we know it. One of the most important things in […]

Read More

How Eliza Corporation Moved Healthcare Data to the Cloud

This is a guest post by Laxmikanth Malladi, Chief Architect at NorthBay. NorthBay is an AWS Advanced Consulting Partner and an AWS Big Data Competency Partner “Pay-for-performance” in healthcare pays providers more to keep the people under their care healthier. This is a departure from fee-for-service where payments are for each service used. Pay-for-performance arrangements provide […]

Read More

Building Event-Driven Batch Analytics on AWS

Karthik Sonti is a Senior Big Data Architect with AWS Professional Services Modern businesses typically collect data from internal and external sources at various frequencies throughout the day. These data sources could be franchise stores, subsidiaries, or new systems integrated as a result of merger and acquisitions. For example, a retail chain might collect point-of-sale […]

Read More