AWS Big Data Blog

Category: Analytics*

Dynamically Scale Applications on Amazon EMR with Auto Scaling

Jonathan Fritz is a Senior Product Manager for Amazon EMR Customers running Apache Spark, Presto, and the Apache Hadoop ecosystem take advantage of Amazon EMR’s elasticity to save costs by terminating clusters after workflows are complete and resizing clusters with low-cost Amazon EC2 Spot Instances. For instance, customers can create clusters for daily ETL or machine learning […]

Read More

Build a Community of Analysts with Amazon QuickSight

Imagine you’ve just landed your dream job. You’ve always liked tackling the hardest problems and you’ve got one now: You’ll work for a chain of coffee shops that’s struggling against fierce competition, tight budgets, and low morale. But there’s a new management team in place. As head of business intelligence (BI), you think you can […]

Read More

Scale Your Amazon Kinesis Stream Capacity with UpdateShardCount

Allan MacInnis is a Kinesis Solution Architect for Amazon Web Services Starting today, you can easily scale your Amazon Kinesis streams to respond in real time to changes in your streaming data needs. Customers use Amazon Kinesis to capture, store, and analyze terabytes of data per hour from clickstreams, financial transactions, social media feeds, and […]

Read More

Use Apache Flink on Amazon EMR

Craig Foster is a Big Data Engineer with Amazon EMR Apache Flink is a parallel data processing engine that customers are using to build real time, big data applications. Flink enables you to perform transformations on many different data sources, such as Amazon Kinesis Streams or the Apache Cassandra database.  It provides both batch and […]

Read More

Running sparklyr – RStudio’s R Interface to Spark on Amazon EMR

Tom Zeng is a Solutions Architect for Amazon EMR The recently released sparklyr package by RStudio has made processing big data in R a lot easier. sparklyr is an R interface to Spark, it allows using Spark as the backend for dplyr – one of the most popular data manipulation packages. sparklyr also allows user […]

Read More

How Eliza Corporation Moved Healthcare Data to the Cloud

This is a guest post by Laxmikanth Malladi, Chief Architect at NorthBay. NorthBay is an AWS Advanced Consulting Partner and an AWS Big Data Competency Partner “Pay-for-performance” in healthcare pays providers more to keep the people under their care healthier. This is a departure from fee-for-service where payments are for each service used. Pay-for-performance arrangements provide […]

Read More

Building Event-Driven Batch Analytics on AWS

Karthik Sonti is a Senior Big Data Architect with AWS Professional Services Modern businesses typically collect data from internal and external sources at various frequencies throughout the day. These data sources could be franchise stores, subsidiaries, or new systems integrated as a result of merger and acquisitions. For example, a retail chain might collect point-of-sale […]

Read More

Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS

Prasad Alle is a consultant with AWS Professional Services Intuit, a creator of business and financial management solutions, is a leading enterprise customer for AWS. The Intuit Data team (IDEA) at Intuit is responsible for building platforms and products that enable a data-driven personalized experience across Intuit products and services. One dimension of this platform […]

Read More

Amazon EMR-DynamoDB Connector Repository on AWSLabs GitHub

Mike Grimes is a Software Development Engineer with Amazon EMR Amazon Web Services is excited to announce that the Amazon EMR-DynamoDB Connector is now open-source. The EMR-DynamoDB Connector is a set of libraries that lets you access data stored in DynamoDB with Spark, Hadoop MapReduce, and Hive jobs. These libraries are currently shipped with EMR […]

Read More

Encrypt Data At-Rest and In-Flight on Amazon EMR with Security Configurations

Customers running analytics, stream processing, machine learning, and ETL workloads on personally identifiable information, health information, and financial data have strict requirements for encryption of data at-rest and in-transit. The Apache Spark and Hadoop ecosystems lend themselves to these big data use cases, and customers have asked us to provide a quick and easy way […]

Read More