AWS Big Data Blog
Category: AWS Big Data
Analyze and visualize nested JSON data with Amazon Athena and Amazon QuickSight
April 2024: This post was reviewed for accuracy. Although structured data remains the backbone for many data platforms, increasingly unstructured or semi-structured data is used to enrich existing information or create new insights. Amazon Athena enables you to analyze a wide variety of data. This includes tabular data in CSV or Apache Parquet files, data […]
Run Amazon payments analytics with 750 TB of data on Amazon Redshift
The Amazon Payments Data Engineering team is responsible for data ingestion, transformation, and storage of a growing dataset of more than 750 TB. The team makes these services available to more than 300 business customers around the globe. These customers include managers from the product, marketing, and programs domains; as well as data scientists, business analysts, […]
Analyze your Amazon CloudFront access logs at scale
This blog post focuses on two measures to restructure Amazon CloudFront access logs for optimization: partitioning and conversion to columnar formats. For more details on performance tuning read the blog post about the top 10 performance tuning tips for Amazon Athena.
Reduce costs by migrating Apache Spark and Hadoop to Amazon EMR
Apache Spark and Hadoop are popular frameworks to process data for analytics, often at a fraction of the cost of legacy approaches, yet at scale they may still become expensive propositions. This blog post discusses ways to reduce your total costs of ownership, while also improving staff productivity at the same time. This can be […]
Best Practices for Securing Amazon EMR
This post walks you through some of the principles of Amazon EMR security. It also describes features that you can use in Amazon EMR to help you meet the security and compliance objectives for your business. We cover some common security best practices that we see used. We also show some sample configurations to get you started.
Manage centralized Microsoft Exchange Server logs using Amazon Kinesis Agent for Windows
This blog post discusses an efficient architecture to stream, analyze, and store Microsoft Exchange Server logs. For frequent queries and operational analytics, we use Amazon Elasticsearch Service (Amazon ES) and Kibana for real-time visualization.
Embed interactive dashboards in your application with Amazon QuickSight
Starting today, you can embed Amazon QuickSight dashboards in your applications. This means you can now quickly and efficiently enhance your applications with advanced interactive data visualizations and analytics capabilities without any custom development.
Amazon QuickSight announces ML Insights in preview
Amazon QuickSight is a fast, cloud-powered BI service that makes it easy for everyone in an organization to get business insights from their data through rich, interactive dashboards. With pay-per-session pricing and embedded dashboard, we made BI even more cost-effective and accessible to everyone. However, as the volume of data that customers generate continues to […]
Scale your Amazon Redshift clusters up and down in minutes to get the performance you need, when you need it
Amazon Redshift is the cloud data warehouse of choice for organizations of all sizes—from fast-growing technology companies such as Turo and Yelp to Fortune 500 companies such as 21st Century Fox and Johnson & Johnson. With quickly expanding use cases, data sizes, and analyst populations, these customers have a critical need for scalable data warehouses. […]
Scale Amazon Kinesis Data Streams with AWS Application Auto Scaling
Recently, AWS launched a new feature of AWS Application Auto Scaling that let you define scaling policies that automatically add and remove shards to an Amazon Kinesis Data Stream. For more detailed information about this feature, see the Application Auto Scaling GitHub repository. As your streaming information increases, you require a scaling solution to accommodate […]