AWS Big Data Blog

Category: Analytics

Simplify data pipelines with AWS Glue automatic code generation and Workflows

In this post, we discuss how to leverage the automatic code generation process in AWS Glue ETL to simplify common data manipulation tasks, such as data type conversion and flattening complex structures. We also explore using AWS Glue Workflows to build and orchestrate data pipelines of varying complexity. Lastly, we look at how you can leverage the power of SQL, with the use of AWS Glue ETL and Glue Data Catalog, to query and transform your data.

Read More

Exploring the public AWS COVID-19 data lake

This post walks you through accessing the AWS COVID-19 data lake through the AWS Glue Data Catalog via Amazon SageMaker or Jupyter and using the open-source AWS Data Wrangler library. AWS Data Wrangler is an open-source Python package that extends the power of Pandas library to AWS and connects DataFrames and AWS data-related services (such as Amazon Redshift, Amazon S3, AWS Glue, Amazon Athena, and Amazon EMR). For more information about what you can build by using this data lake, see the associated public Jupyter notebook on GitHub.

Read More

Federate Amazon Redshift access with Microsoft Azure AD single sign-on

Recently, we helped a large enterprise customer who was building their data warehouse on Amazon Redshift, using Azure AD as a corporate directory. Their requirement was to enable data warehouse users to use their corporate credentials to query data in Redshift. Doing so not only provided a better user experience — users can utilize their […]

Read More

Ingest streaming data into Amazon Elasticsearch Service within the privacy of your VPC with Amazon Kinesis Data Firehose

Today we are adding a new Amazon Kinesis Data Firehose feature to set up VPC delivery to your Amazon Elasticsearch Service domain from the Kinesis Data Firehose. If you have been managing a custom application on Amazon Kinesis Data Streams to keep traffic private, you can now use Kinesis Data Firehose and load your data […]

Read More

Achieve finer-grained data security with column-level access control in Amazon Redshift

Amazon Redshift is the most popular cloud data warehouse because it provides fast insights at a low cost. Customers can confidently run mission critical workloads, even in highly regulated industries, because Amazon Redshift comes with out of the box security and compliance. The security features, combined with the ability to easily analyze data in-place and […]

Read More

Speed up your ELT and BI queries with Amazon Redshift materialized views

The Amazon Redshift materialized views function helps you achieve significantly faster query performance on repeated or predictable workloads such as dashboard queries from Business Intelligence (BI) tools, such as Amazon QuickSight. It also speeds up and simplifies extract, load, and transform (ELT) data processing. You can use materialized views to store frequently used precomputations and […]

Read More

Query, visualize, and forecast TruFactor web session intelligence with AWS Data Exchange

This post showcases TruFactor Intelligence-as-a-Service data on AWS Data Exchange. TruFactor’s anonymization platform and proprietary AI ingests, filters, and transforms more than 85 billion high-quality raw signals daily from wireless carriers, OEMs, and mobile apps into a unified phygital consumer graph across physical and digital dimensions. TruFactor intelligence is application-ready for use within any AWS analytics or ML service to power your models and applications running on AWS, with no additional processing required.

Read More

Build a Simplified ETL and Live Data Query Solution using Redshift Federated Query

You may have heard the saying that the best ETL is no ETL. Amazon Redshift now makes this possible with Federated Query. In its initial release, this feature lets you query data in Amazon Aurora PostgreSQL or Amazon RDS for PostgreSQL using Amazon Redshift external schemas. Federated Query also exposes the metadata from these source databases through system views and driver APIs, which allows business intelligence tools like Tableau and Amazon Quicksight to connect to Amazon Redshift and query data in PostgreSQL without having to make local copies.

Read More

Build a cloud-native network performance analytics solution on AWS for wireless service providers

This post demonstrates a serverless, cloud-based approach to building a network performance analytics solution using AWS services that can provide flexibility and performance while keeping costs under control with pay-per-use AWS services. Without good network performance, you may struggle to face the challenges of real-time and low latency services and the increase of the total […]

Read More