AWS Big Data Blog

Category: Analytics

Using BlueTalon with Amazon EMR

This is a guest post by Pratik Verma, Founder and Chief Product Officer at BlueTalon. Leonid Fedotov, Senior Solution Architect at BlueTalon, also contributed to this post. Amazon Elastic MapReduce (Amazon EMR) makes it easy to quickly and cost-effectively process vast amounts of data in the cloud. EMR gets used for log, financial, fraud, and […]

Read More

Integrating Amazon Kinesis, Amazon S3 and Amazon Redshift with Cascading on Amazon EMR

This is a guest post by Ryan Desmond, Solutions Architect at Concurrent. Concurrent is an AWS Advanced Technology Partner. With Amazon Kinesis developers can quickly store, collate and access large, distributed data streams such as access logs, click streams and IoT data in real-time. The question then becomes, how can we access and leverage this […]

Read More

Extending Seven Bridges Genomics with Amazon Redshift and R

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services The article was co-authored by Zeynep Onder, Scientist, Seven Bridges Genomics, an AWS Advanced Technology Partner. “ACTGCTTCGACTCGGGTCCA” That is probably not a coding language readily understood by many reading this blog post, but it is a programming framework that defines all […]

Read More

Implementing Efficient and Reliable Producers with the Amazon Kinesis Producer Library

Kevin Deng is an SDE with the Amazon Kinesis team and is the lead author of the Amazon Kinesis Producer Library How do you vertically scale an Amazon Kinesis producer application by 100x? While it’s easy to get started with streaming data into Amazon Kinesis, streaming large volumes of data efficiently and reliably presents some […]

Read More

Connecting R with Amazon Redshift

Markus Schmidberger is a Senior Big Data Consultant for AWS Professional Services Amazon Redshift is a fast, fully managed, scalable data warehouse (DWH) for PB of data. AWS customers are moving huge amounts of structured data into Amazon Redshift to offload analytics workloads or to operate their DWH fully in the cloud. Business intelligence and […]

Read More

Running R on AWS

Many AWS customers already use the popular open-source statistic software R for big data analytics and data science. Other customers have asked for instructions and best practices for running R on AWS. Several months ago, I (Markus) wrote a post showing you how to connect R with Amazon EMR, install RStudio on the Hadoop master node, and use R […]

Read More

Presto-Amazon Kinesis Connector for Interactively Querying Streaming Data

This is a guest post by Sivaramakrishnan Narayanan, Member of Technical Staff at Qubole, and Xing Quan, Director of Product Management at Qubole. Qubole is an AWS Advanced Technology Partner. Amazon Kinesis is a scalable and fully managed service for streaming large, distributed data sets. As applications (particularly on mobile and wearable devices) start to […]

Read More

How Expedia Implemented Near Real-time Analysis of Interdependent Datasets

This is a guest post by Stephen Verstraete, a manager at Pariveda Solutions. Pariveda Solutions is an AWS Premier Consulting Partner. Common patterns exist for batch processing and real-time processing of Big Data. However, we haven’t seen patterns that allow us to process batches of dependent data in real-time. Expedia’s marketing group needed to analyze […]

Read More

Large-Scale Machine Learning with Spark on Amazon EMR

This is a guest post by Jeff Smith, Data Engineer at Intent Media. Intent Media, in their own words: “Intent Media operates a platform for advertising on commerce sites.  We help online travel companies optimize revenue on their websites and apps through sophisticated data science capabilities. On the data team at Intent Media, we are […]

Read More

Test drive two big data scenarios from the ‘Building a Big Data Platform on AWS’ bootcamp

Matt Yanchyshyn is a Sr. Manager for AWS Solutions Architecture AWS offers a number of events during the year such as our annual AWS re:Invent conference, the AWS Summit series, the AWS Pop-up Loft, and a variety of roadshows. All of these provide opportunities for AWS customers to attend talks focused on big data and […]

Read More