AWS Big Data Blog

Introducing the Cloud Shuffle Storage Plugin for Apache Spark

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. In AWS Glue, you can use Apache Spark, an open-source, distributed processing system for your data integration tasks and big data workloads. Apache Spark utilizes in-memory caching and optimized […]

Centrally manage access and permissions for Amazon Redshift data sharing with AWS Lake Formation

Today’s global, data-driven organizations treat data as an asset and use it across different lines of business (LOBs) to drive timely insights and better business decisions. Amazon Redshift data sharing allows you to securely share live, transactionally consistent data in one Amazon Redshift data warehouse with another Amazon Redshift data warehouse within the same AWS […]

Data: The genesis for modern invention

It only takes one groundbreaking invention—one iconic idea that solves a widespread pain point for customers—to create or transform an industry forever. From the invention of the telegraph, to the discovery of GPS, to the earliest cloud computing services, history is filled with examples of these “eureka” moments that continue to have long-lasting impacts on […]

Log analytics the easy way with Amazon OpenSearch Serverless

We recently announced the preview release of Amazon OpenSearch Serverless, a new serverless option for Amazon OpenSearch Service, which makes it easy for you to run large-scale search and analytics workloads without having to configure, manage, or scale OpenSearch clusters. It automatically provisions and scales the underlying resources to deliver fast data ingestion and query […]

New analytical questions available in Amazon QuickSight Q: “Why” and “Forecast”

Amazon QuickSight Q uses machine learning (ML) to enable any user to ask questions about business data in natural language and receive accurate answers with relevant visualizations in seconds. Today, Amazon QuickSight announces support for two new question types that simplify and scale complex analytical tasks using natural language: “forecast” and “why.” In this post, […]

Simplify data loading on the Amazon Redshift console with Informatica Data Loader

Amazon Redshift is the fastest, most widely used, fully managed, petabyte-scale cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Data engineers, data analysts, and data scientists want to use this data to power analytics workloads such as business intelligence (BI), […]

Run queries concurrently and see query history using Amazon Redshift Query Editor v2

Amazon Redshift is a fast, fully managed, petabyte-scale cloud data warehouse. You have the flexibility to choose from provisioned and serverless compute modes. You can start loading and querying large datasets conveniently in Amazon Redshift using Amazon Redshift Query Editor v2, a web-based SQL client application. Query Editor v2 empowers your technical and business teams […]

Create advanced insights using level-aware calculations in Amazon QuickSight

Calculation at the right granularity always needs to be handled carefully when performing data analytics. Especially when data is generated through joining across multiple tables, the denormalization of datasets can add a lot of complications to make accurate calculations challenging. Amazon QuickSight recently launched a new functionality called level-aware calculations (LAC), which enables you to […]

Scale AWS SDK for pandas workloads with AWS Glue for Ray

AWS SDK for pandas is an open-source library that extends the popular Python pandas library, enabling you to connect to AWS data and analytics services using pandas data frames. We’ve seen customers use the library in combination with pandas for both data engineering and AI workloads. Although pandas data frames are simple to use, they […]

Introducing AWS Glue for Ray: Scaling your data integration workloads using Python

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. Today, AWS Glue processes customer jobs using either Apache Spark’s distributed processing engine for large workloads or Python’s single-node processing engine for smaller workloads. Customers […]