AWS Big Data Blog

Category: Analytics

Build secure encrypted data lakes with AWS Lake Formation

Maintaining customer data privacy, protection against intellectual property loss, and compliance with data protection laws are essential objectives of today’s organizations. To protect data against security threats, vulnerabilities within the organization, malicious software, or cyber criminality, organizations are increasingly encrypting their data. Although you can enable server-side encryption in Amazon Simple Storage Service (Amazon S3), […]

Read More

Improve query performance using AWS Glue partition indexes

While creating data lakes on the cloud, the data catalog is crucial to centralize metadata and make the data visible, searchable, and queryable for users. With the recent exponential growth of data volume, it becomes much more important to optimize data layout and maintain the metadata on cloud storage to keep the value of data […]

Read More

Build a data quality score card using AWS Glue DataBrew, Amazon Athena, and Amazon QuickSight

Data quality plays an important role while building an extract, transform, and load (ETL) pipeline for sending data to downstream analytical applications and machine learning (ML) models. The analogy “garbage in, garbage out” is apt at describing why it’s important to filter out bad data before further processing. Continuously monitoring data quality and comparing it […]

Read More

How Optus improves broadband and mobile customer experience using the Network Data Analytics platform on AWS

This is a guest blog post co-written by Rajagopal Mahendran, Development Manager at the Optus IT Innovation Team. Optus is part of The Singtel group, which operates in one of the world’s fastest growing and most dynamic regions, with a presence in 21 countries. Optus provides not only core telecom services, but also an extensive […]

Read More

Create threshold-based alerts in Amazon QuickSight

Every business has a set of key metrics that stakeholders focus on to make the most accurate, data-driven decisions, such as sales per week, inventory turnover rate, daily website visitors, and so on. With threshold-based alerts in Amazon QuickSight, we’re making it simpler than ever for consumers of QuickSight dashboards to stay informed about their […]

Read More

Securely analyze your data with AWS Lake Formation and Amazon QuickSight

Many useful business insights can arise from analyzing customer preferences, behavior, and usage patterns. With this information, businesses can innovate faster and improve the customer experience, leading to better engagement and accelerating product adoption. More and more businesses are looking for ways to securely store and restrict access to customer data, which may include personally […]

Read More

Simplify incoming data ingestion with dynamic parameterized datasets in AWS Glue DataBrew

When data analysts and data scientists prepare data for analysis, they often rely on periodically generated data produced by upstream services, such as labeling datasets from Amazon SageMaker Ground Truth or Cost and Usage Reports from AWS Billing and Cost Management. Alternatively, they can regularly upload such data to Amazon Simple Storage Service (Amazon S3) […]

Read More

Set up CI/CD pipelines for AWS Glue DataBrew using AWS Developer Tools

An integral part of DevOps is adopting the culture of continuous integration and continuous delivery (CI/CD). This enables teams to securely store and version code, maintain parity between development and production environments, and achieve end-to-end automation of the release cycle, including building, testing, and deploying to production. In essence, development teams follow CI/CD processes to […]

Read More

How Amazon Customer Service lowered Amazon Redshift costs and improved performance using RA3 nodes

Amazon Customer Service solves exciting and challenging customer care problems for Amazon.com, the world’s largest online retailer. In 2021, the Amazon Customer Service Technology team upgraded its dense-compute nodes (dc2.8xlarge) to the Amazon Redshift RA3 instance family (ra3.16xlarge). Moving to the most advanced Amazon Redshift architecture enabled the team to reduce its infrastructure costs, improve […]

Read More

Simplify Amazon Redshift RA3 migration evaluation with Simple Replay utility

Amazon Redshift is a fast, fully managed, widely popular cloud data warehouse that allows you to process exabytes of data across your data warehouse, operational database, and data lake using standard SQL. It offers different node types to accommodate various workloads; you can choose from RA3, DC2, and DS2 depending on your requirements. RA3 is […]

Read More