AWS Big Data Blog

Integrate Power BI with Amazon Redshift for insights and analytics

Amazon Redshift is a fast, fully managed, cloud-native data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Microsoft Power BI is a business analytics service that delivers insights to enable fast, informed decisions. With Power BI, you can perform ad-hoc query […]

Read More

Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics

Most businesses generate data continuously in real time and at ever-increasing volumes. Data is generated as users play mobile games, load balancers log requests, customers shop on your website, and temperature changes on IoT sensors. You can capitalize on time-sensitive events, improve customer experiences, increase efficiency, and drive innovation by analyzing this data quickly. The […]

Read More

How Siemens built a fully managed scheduling mechanism for updates on Amazon S3 data lakes

Siemens is a global technology leader with more than 370,000 employees and 170 years of experience. To protect Siemens from cybercrime, the Siemens Cyber Defense Center (CDC) continuously monitors Siemens’ networks and assets. To handle the resulting enormous data load, the CDC built a next-generation threat detection and analysis platform called ARGOS. ARGOS is a […]

Read More

Load data incrementally and optimized Parquet writer with AWS Glue

AWS Glue provides a serverless environment to prepare (extract and transform) and load large amounts of datasets from a variety of sources for analytics and data processing with Apache Spark ETL jobs. The first post of the series, Best practices to scale Apache Spark jobs and partition data with AWS Glue, discusses best practices to […]

Read More

Build machine learning-powered business intelligence analyses using Amazon QuickSight

Imagine you can see the future—to know how many customers will order your product months ahead of time so you can make adequate provisions, or to know how many of your employees will leave your organization several months in advance so you can take preemptive actions to encourage staff retention. For an organization that sees […]

Read More

Analyze your Amazon S3 spend using AWS Glue and Amazon Redshift

The AWS Cost & Usage Report (CUR) tracks your AWS usage and provides estimated charges associated with that usage. You can configure this report to present the data at hourly or daily intervals, and it is updated at least one time per day until it is finalized at the end of the billing period. The […]

Read More

Cross-account AWS Glue Data Catalog access with Amazon Athena

Many AWS customers use a multi-account strategy. A centralized AWS Glue Data Catalog is important to minimize the amount of administration related to sharing metadata across different accounts. This post introduces capability that allows Amazon Athena to query a centralized Data Catalog across different AWS accounts. Overview of solution In late 2019, AWS introduced the […]

Read More

How FactSet automated exporting data from Amazon DynamoDB to Amazon S3 Parquet to build a data analytics platform

This is a guest post by Arvind Godbole, Lead Software Engineer with FactSet and Tarik Makota, AWS Principal Solutions Architect. In their own words “FactSet creates flexible, open data and software solutions for tens of thousands of investment professionals around the world, which provides instant access to financial data and analytics that investors use to […]

Read More

Amazon Redshift at re:Invent 2019

The annual AWS re:Invent learning conference is an exciting time full of new product and program launches. At the first re:Invent conference in 2012, AWS announced Amazon Redshift. Since then, tens of thousands of customers have started using Amazon Redshift as their cloud data warehouse. In 2019, AWS shared several significant launches and dozens of […]

Read More

How Verizon Media Group migrated from on-premises Apache Hadoop and Spark to Amazon EMR

This is a guest post by Verizon Media Group. At Verizon Media Group (VMG), one of the major problems we faced was the inability to scale out computing capacity in a required amount of time—hardware acquisitions often took months to complete. Scaling and upgrading hardware to accommodate workload changes was not economically viable, and upgrading […]

Read More