AWS Big Data Blog

Run queries concurrently and see query history using Amazon Redshift Query Editor v2

Amazon Redshift is a fast, fully managed, petabyte-scale cloud data warehouse. You have the flexibility to choose from provisioned and serverless compute modes. You can start loading and querying large datasets conveniently in Amazon Redshift using Amazon Redshift Query Editor v2, a web-based SQL client application. Query Editor v2 empowers your technical and business teams […]

Create advanced insights using level-aware calculations in Amazon QuickSight

Calculation at the right granularity always needs to be handled carefully when performing data analytics. Especially when data is generated through joining across multiple tables, the denormalization of datasets can add a lot of complications to make accurate calculations challenging. Amazon QuickSight recently launched a new functionality called level-aware calculations (LAC), which enables you to […]

Scale AWS SDK for pandas workloads with AWS Glue for Ray

September 2023: This post was reviewed and updated with a new dataset and related code blocks and images. AWS SDK for pandas is an open-source library that extends the popular Python pandas library, enabling you to connect to AWS data and analytics services using pandas data frames. We’ve seen customers use the library in combination […]

Introducing AWS Glue for Ray: Scaling your data integration workloads using Python

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. Today, AWS Glue processes customer jobs using either Apache Spark’s distributed processing engine for large workloads or Python’s single-node processing engine for smaller workloads. Customers […]

Lower your Amazon OpenSearch Service storage cost with gp3 Amazon EBS volumes

Amazon OpenSearch Service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open-source, distributed search and analytics suite comprising OpenSearch, a distributed search and analytics engine, and OpenSearch Dashboards, a UI and visualization tool. When you use Amazon OpenSearch Service, you configure a set […]

Create small multiples in Amazon QuickSight

We’re excited to announce the launch of small multiples in Amazon QuickSight at AWS re:Invent 2022! Small multiples is one of the most powerful data visualization features when it comes to comparative analysis. Previously, you had to either use a filter or create multiple visuals side by side to analyze multiples slices of the same […]

Add text boxes to your Amazon QuickSight analysis

We are excited to announce the launch of text boxes in Amazon QuickSight. Now you can add text for common use cases, including but not limited to titles, subtitles, annotations, adding additional information for KPIs etc has been simplified than ever before with the new text box. You can reposition, resize, and make your text […]

New line chart customization options in Amazon QuickSight

Amazon QuickSight is a serverless, cloud-based business intelligence (BI) service that brings data insights to your teams and end-users through machine learning (ML)-powered dashboards and data visualizations that can be accessed via QuickSight or embedded in apps and portals that your users access. Line charts in QuickSight have undergone a major overhaul this year, starting […]

Implement row-level access control in a multi-tenant environment with Amazon Redshift

This is a guest post co-written with Siva Bangaru and Leon Liu from ADP. ADP helps organizations of all types and sizes by providing human capital management (HCM) solutions that unite HR, payroll, talent, time, tax, and benefits administration. ADP is a leader in business outsourcing services, analytics, and compliance expertise. ADP’s unmatched experience, deep […]

Build your Apache Hudi data lake on AWS using Amazon EMR – Part 1

Apache Hudi is an open-source transactional data lake framework that greatly simplifies incremental data processing and data pipeline development. It does this by bringing core warehouse and database functionality directly to a data lake on Amazon Simple Storage Service (Amazon S3) or Apache HDFS. Hudi provides table management, instantaneous views, efficient upserts/deletes, advanced indexes, streaming […]