AWS Big Data Blog

Import Zeppelin notes from GitHub or JSON in Zeppelin 0.5.6 on Amazon EMR

Jonathan Fritz is a Senior Product Manager for Amazon EMR Many Amazon EMR customers use Zeppelin to create interactive notebooks to run workloads with Spark using Scala, Python, and SQL. These customers have found Amazon EMR to be a great platform for running Zeppelin because of strong integration with other AWS services and the ability […]

Read More

Join us at Strata + Hadoop World Conference in San Jose, March 29-31

by Jorge A. Lopez | on | Permalink | Comments |  Share

Jorge A. Lopez is responsible for Big Data Solutions Marketing at AWS Visit us Come see the AWS Big Data team at Booth #736, where big data experts will be happy to answer your questions, hear about your specific requirements, and help you with your big data initiatives. Click to reserve a consultation slot wih […]

Read More

AWS Big Data Meetup March 22 in Seattle: Intro to SparkR and breakout discussions

by Steve McPherson | on | Permalink | Comments |  Share

Join and RSVP! AWS Speaker Christopher Crosbie, Healthcare and Life Sciences Partner Solutions Architect for Amazon Web Services For a long time, R users have sliced and diced their computational problems into smaller pieces to be able to run it in smaller chunks. But what if you want to compute on a huge dataframe with […]

Read More

Analyze a Time Series in Real Time with AWS Lambda, Amazon Kinesis and Amazon DynamoDB Streams

This is a guest post by Richard Freeman, Ph.D., a solutions architect and data scientist at JustGiving. JustGiving in their own words: “We are one of the world’s largest social platforms for giving that’s helped 26.1 million registered users in 196 countries raise $3.8 billion for over 27,000 good causes.” Introduction As more devices, sensors […]

Read More

AWS Partner Post Spotlight: Attunity

by Andy Werth | on | Permalink | Comments |  Share

Partners are a vital part of the AWS ecosystem, and AWS Partners have made important contributions to the AWS Big Data Blog. This month’s Partner Post Spotlight is on Attunity, who co-authored the post “Using Attunity CloudBeam at UMUC to Replicate Data to Amazon RDS and Amazon Redshift.” Their post explains how UMUC used Attunity […]

Read More

Big Data Website Gets a Big Makeover at AWS

by Jorge A. Lopez | on | Permalink | Comments |  Share

Jorge A. Lopez is responsible for Big Data Solutions Marketing at AWS The big data ecosystem is evolving at a tremendous pace, giving rise to a plethora of tools, use cases, and applications. The new AWS Big Data website is now the ideal starting point to learn about new and existing capabilities, and the services […]

Read More

Analyze Your Data on Amazon DynamoDB with Apache Spark

Manjeet Chayel is a Solutions Architect with AWS Every day, tons of customer data is generated, such as website logs, gaming data, advertising data, and streaming videos. Many companies capture this information as it’s generated and process it in real time to understand their customers. Amazon DynamoDB is a fast and flexible NoSQL database service […]

Read More

Month in Review: February 2016

by Andy Werth | on | Permalink | Comments |  Share

Lots for big data enthusiasts in February on the AWS Big Data Blog. Take a look! Submitting User Applications with spark-submit Learn how to set spark-submit flags to control the memory and compute resources available to your application submitted to Spark running on EMR. Learn when to use the maximizeResourceAllocation configuration option and dynamic allocation […]

Read More

Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams

Rahul Bhartia is a Solutions Architect with AWS Martin Schade, a Solutions Architect with AWS, also contributed to this post. Do you use real-time analytics on AWS to quickly extract value from large volumes of data streams? For example, have you built a recommendation engine on clickstream data to personalize content suggestions in real time […]

Read More

Introducing On-Demand Pipeline Execution in AWS Data Pipeline

Marc Beitchman is a Software Development Engineer in the AWS Database Services team Now it is possible to trigger activation of pipelines in AWS Data Pipeline using the new on-demand schedule type. You can access this functionality through the existing AWS Data Pipeline activation API. On-demand schedules make it easy to integrate pipelines in AWS […]

Read More