AWS Big Data Blog

Category: Analytics

Encrypt Your Amazon Redshift Loads with Amazon S3 and AWS KMS

Russell Nash is a Solutions Architect with AWS Have you been looking for a straightforward way to encrypt your Amazon Redshift data loads? Have you wondered how to safely manage the keys and where to perform the encryption? In this post, I will walk through a solution that meets these requirements by showing you how […]

Read More

Will Spark Power the Data behind Precision Medicine?

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services. This post was co-authored by Ujjwal Ratan, a Solutions Architect with Amazon Web Services. ——————————— “And that’s the promise of precision medicine — delivering the right treatments, at the right time, every time to the right person.“ (President Obama, 2015 State […]

Read More

Crunching Statistics at Scale with SparkR on Amazon EMR

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services. This post is co-authored by Gopal Wunnava, a Senior Consultant with AWS Professional Services. SparkR is an R package that allows you to integrate complex statistical analysis with large datasets. In this blog post, we introduce you running R with the […]

Read More

Anomaly Detection Using PySpark, Hive, and Hue on Amazon EMR

Veronika Megler, Ph.D., is a Senior Consultant with AWS Professional Services We are surrounded by more and more sensors – some of which we’re not even consciously aware. As sensors become cheaper and easier to connect, they create an increasing flood of data that’s getting cheaper and easier to store and process. However, sensor readings […]

Read More

Import Zeppelin notes from GitHub or JSON in Zeppelin 0.5.6 on Amazon EMR

Jonathan Fritz is a Senior Product Manager for Amazon EMR Many Amazon EMR customers use Zeppelin to create interactive notebooks to run workloads with Spark using Scala, Python, and SQL. These customers have found Amazon EMR to be a great platform for running Zeppelin because of strong integration with other AWS services and the ability […]

Read More

Analyze a Time Series in Real Time with AWS Lambda, Amazon Kinesis and Amazon DynamoDB Streams

This is a guest post by Richard Freeman, Ph.D., a solutions architect and data scientist at JustGiving. JustGiving in their own words: “We are one of the world’s largest social platforms for giving that’s helped 26.1 million registered users in 196 countries raise $3.8 billion for over 27,000 good causes.” Introduction As more devices, sensors […]

Read More

Analyze Your Data on Amazon DynamoDB with Apache Spark

Manjeet Chayel is a Solutions Architect with AWS Every day, tons of customer data is generated, such as website logs, gaming data, advertising data, and streaming videos. Many companies capture this information as it’s generated and process it in real time to understand their customers. Amazon DynamoDB is a fast and flexible NoSQL database service […]

Read More

Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams

Rahul Bhartia is a Solutions Architect with AWS Martin Schade, a Solutions Architect with AWS, also contributed to this post. Do you use real-time analytics on AWS to quickly extract value from large volumes of data streams? For example, have you built a recommendation engine on clickstream data to personalize content suggestions in real time […]

Read More

Introducing On-Demand Pipeline Execution in AWS Data Pipeline

Marc Beitchman is a Software Development Engineer in the AWS Database Services team Now it is possible to trigger activation of pipelines in AWS Data Pipeline using the new on-demand schedule type. You can access this functionality through the existing AWS Data Pipeline activation API. On-demand schedules make it easy to integrate pipelines in AWS […]

Read More

Process Amazon Kinesis Aggregated Data with AWS Lambda

Ian Meyers is a Solutions Architecture Senior Manager with AWS Last year, we introduced the Amazon Kinesis Producer Library (KPL) to simplify the development of applications that need to send data to Amazon Kinesis Streams. Many customers use aggregation, which allows you to send multiple records to a single Amazon Kinesis Streams record.  Although the […]

Read More