AWS Big Data Blog

Category: Analytics

Run Common Data Science Packages on Anaconda and Oozie with Amazon EMR

In the world of data science, users must often sacrifice cluster set-up time to allow for complex usability scenarios. Amazon EMR allows data scientists to spin up complex cluster configurations easily, and to be up and running with complex queries in a matter of minutes. Data scientists often use scheduling applications such as Oozie to […]

Read More

Setting up Read Replica Clusters with HBase on Amazon S3

Many customers have taken advantage of the numerous benefits of running Apache HBase on Amazon S3 for data storage, including lower costs, data durability, and easier scalability. Customers such as FINRA have lowered their costs by 60% by moving to an HBase on S3 architecture along with the numerous operational benefits that come with decoupling […]

Read More

Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena

One of the great benefits of Amazon S3 is the ability to host, share, or consume public data sets. This provides transparency into data to which an external data scientist or developer might not normally have access. By exposing the data to the public, you can glean many insights that would have been difficult with […]

Read More

Perform Near Real-time Analytics on Streaming Data with Amazon Kinesis and Amazon Elasticsearch Service

Nowadays, streaming data is seen and used everywhere—from social networks, to mobile and web applications, IoT devices, instrumentation in data centers, and many other sources. As the speed and volume of this type of data increases, the need to perform data analysis in real time with machine learning algorithms and extract a deeper understanding from […]

Read More

Visualize Amazon S3 Analytics Data with Amazon QuickSight

When Amazon S3 analytics was released in November 2016, it gave you the ability to analyze storage access patterns and transition the right data to the right storage class. You could also manually export the data to an S3 bucket to analyze, using the business intelligence tool of your choice, and gather deeper insights on usage and growth patterns. This helped you reduce storage costs while optimizing performance based on usage patterns. With today’s update, you can quickly and easily gain those deeper insights and benefits by analyzing and visualizing S3 analytics data in Amazon QuickSight. It takes just a single click from the S3 console, without the need for manual exports or additional data preparation.

Read More

Under the Hood of Server-Side Encryption for Amazon Kinesis Streams

Customers are using Amazon Kinesis Streams to ingest, process, and deliver data in real time from millions of devices or applications. Use cases for Kinesis Streams vary, but a few common ones include IoT data ingestion and analytics, log processing, clickstream analytics, and enterprise data bus architectures. Within milliseconds of data arrival, applications (KCL, Apache […]

Read More

Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight

If you run an operation that continuously generates a large amount of data, you may want to know what kind of data is being inserted by your application. The ability to analyze data intake quickly can be very valuable for business units, such as operations and marketing. For many operations, it’s important to see what […]

Read More

Visualize and Monitor Amazon EC2 Events with Amazon CloudWatch Events and Amazon Kinesis Firehose

Monitoring your AWS environment is important for security, performance, and cost control purposes. For example, by monitoring and analyzing API calls made to your Amazon EC2 instances, you can trace security incidents and gain insights into administrative behaviors and access patterns. The kinds of events you might monitor include console logins, Amazon EBS snapshot creation/deletion/modification, […]

Read More