AWS Big Data Blog

Converging Data Silos to Amazon Redshift Using AWS DMS

Organizations often grow organically—and so does their data in individual silos. Such systems are often powered by traditional RDBMS systems and they grow orthogonally in size and features. To gain intelligence across heterogeneous data sources, you have to join the data sets. However, this imposes new challenges, as joining data over dblinks or into a […]

Read More

Call for Papers! DEEM: 1st Workshop on Data Management for End-to-End Machine Learning

Amazon and Matroid will hold the first workshop on Data Management for End-to-End Machine Learning (DEEM) on May 14th, 2017 in conjunction with the premier systems conference SIGMOD/PODS 2017 in Raleigh, North Carolina. For more details about the workshop focus, see Challenges and opportunities in machine learning below. DEEM brings together researchers and practitioners at […]

Read More

Decreasing Game Churn: How Upopa used ironSource Atom and Amazon ML to Engage Users

This is a guest post by Tom Talpir, Software Developer at ironSource. ironSource is as an Advanced AWS Partner Network (APN) Technology Partner and an AWS Big Data Competency Partner. Ever wondered what it takes to keep a user from leaving your game or application after all the hard work you put in? Wouldn’t it be great […]

Read More

Month in Review: December 2016

Another month of big data solutions on the Big Data Blog. Take a look at our summaries below and learn, comment, and share. Thank you for reading! Implementing Authorization and Auditing using Apache Ranger on Amazon EMR Apache Ranger is a framework to enable, monitor, and manage comprehensive data security across the Hadoop platform. Features […]

Read More

Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning

Air travel can be stressful due to the many factors that are simply out of airline passengers’ control. As passengers, we want to minimize this stress as much as we can. We can do this by using past data to make predictions about how likely a flight will be delayed based on the time of […]

Read More

Serving Real-Time Machine Learning Predictions on Amazon EMR

The typical progression for creating and using a trained model for recommendations falls into two general areas: training the model and hosting the model. Model training has become a well-known standard practice. We want to highlight one of many ways to host those recommendations (for example, see the Analyzing Genomics Data at Scale using R, […]

Read More

Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

Ben Snively is a Solutions Architect with AWS Speed and agility are essential with today’s analytics tools. The quicker you can get from idea to first results, the more you can experiment and innovate with your data, perform ad-hoc analysis, and drive answers to new business questions. Serverless architectures help in this respect by taking […]

Read More

Run Jupyter Notebook and JupyterHub on Amazon EMR

NOTE: The content in this post may need periodic updates as newer versions become available. Please leave a comment if you have any trouble implementing this solution. Tom Zeng is a Solutions Architect for Amazon EMR Jupyter Notebook (formerly IPython) is one of the most popular user interfaces for running Python, R, Julia, Scala, and […]

Read More

Respond to State Changes on Amazon EMR Clusters with Amazon CloudWatch Events

Jonathan Fritz is a Senior Product Manager for Amazon EMR Customers can take advantage of the Amazon EMR API to create and terminate EMR clusters, scale clusters using Auto Scaling or manual resizing, and submit and run Apache Spark, Apache Hive, or Apache Pig workloads. These decisions are often triggered from cluster state-related information. Previously, […]

Read More