AWS Big Data Blog

Scaling Writes on Amazon DynamoDB Tables with Global Secondary Indexes

Ian Meyers is a Solutions Architecture Senior Manager with AWS Amazon DynamoDB is a fast, flexible, and fully managed NoSQL database service that supports both document and key-value store models that need consistent, single-digit millisecond latency at any scale. In this post, we discuss a technique that can be used with DynamoDB to ensure virtually […]

Read More

Introduction to Python UDFs in Amazon Redshift

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services When your doctor takes out a prescription pad at your yearly checkup, do you ever stop to wonder what goes into her thought process as she decides on which drug to scribble down? We assume that journals of scientific evidence coupled […]

Read More

Using BlueTalon with Amazon EMR

This is a guest post by Pratik Verma, Founder and Chief Product Officer at BlueTalon. Leonid Fedotov, Senior Solution Architect at BlueTalon, also contributed to this post. Amazon Elastic MapReduce (Amazon EMR) makes it easy to quickly and cost-effectively process vast amounts of data in the cloud. EMR gets used for log, financial, fraud, and […]

Read More

Integrating Amazon Kinesis, Amazon S3 and Amazon Redshift with Cascading on Amazon EMR

This is a guest post by Ryan Desmond, Solutions Architect at Concurrent. Concurrent is an AWS Advanced Technology Partner. With Amazon Kinesis developers can quickly store, collate and access large, distributed data streams such as access logs, click streams and IoT data in real-time. The question then becomes, how can we access and leverage this […]

Read More

Extending Seven Bridges Genomics with Amazon Redshift and R

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services The article was co-authored by Zeynep Onder, Scientist, Seven Bridges Genomics, an AWS Advanced Technology Partner. “ACTGCTTCGACTCGGGTCCA” That is probably not a coding language readily understood by many reading this blog post, but it is a programming framework that defines all […]

Read More

Building and Maintaining an Amazon S3 Metadata Index without Servers

Mike Deck is a Solutions Architect with AWS Amazon S3 is a simple key-based object store whose scalability and low cost make it ideal for storing large datasets. Its design enables S3 to provide excellent performance for storing and retrieving objects based on a known key. Finding objects based on other attributes, however, requires doing […]

Read More

Implementing Efficient and Reliable Producers with the Amazon Kinesis Producer Library

Kevin Deng is an SDE with the Amazon Kinesis team and is the lead author of the Amazon Kinesis Producer Library How do you vertically scale an Amazon Kinesis producer application by 100x? While it’s easy to get started with streaming data into Amazon Kinesis, streaming large volumes of data efficiently and reliably presents some […]

Read More

Connecting R with Amazon Redshift

Markus Schmidberger is a Senior Big Data Consultant for AWS Professional Services Amazon Redshift is a fast, fully managed, scalable data warehouse (DWH) for PB of data. AWS customers are moving huge amounts of structured data into Amazon Redshift to offload analytics workloads or to operate their DWH fully in the cloud. Business intelligence and […]

Read More

Running R on AWS

by Markus Schmidberger and Aaron Friedman | on | in Analytics* | Permalink | Comments |  Share

Many AWS customers already use the popular open-source statistic software R for big data analytics and data science. Other customers have asked for instructions and best practices for running R on AWS. Several months ago, I (Markus) wrote a post showing you how to connect R with Amazon EMR, install RStudio on the Hadoop master node, and use R […]

Read More

Presto-Amazon Kinesis Connector for Interactively Querying Streaming Data

This is a guest post by Sivaramakrishnan Narayanan, Member of Technical Staff at Qubole, and Xing Quan, Director of Product Management at Qubole. Qubole is an AWS Advanced Technology Partner. Amazon Kinesis is a scalable and fully managed service for streaming large, distributed data sets. As applications (particularly on mobile and wearable devices) start to […]

Read More