AWS News Blog

Category: Analytics

New — Amazon SageMaker Data Wrangler Supports SaaS Applications as Data Sources

Data fuels machine learning. In machine learning, data preparation is the process of transforming raw data into a format that is suitable for further processing and analysis. The common process for data preparation starts with collecting data, then cleaning it, labeling it, and finally validating and visualizing it. Getting the data right with high quality […]

New — Amazon Athena for Apache Spark

When Jeff Barr first announced Amazon Athena in 2016, it changed my perspective on interacting with data. With Amazon Athena, I can interact with my data in just a few steps—starting from creating a table in Athena, loading data using connectors, and querying using the ANSI SQL standard. Over time, various industries, such as financial […]

New – Announcing Automated Data Preparation for Amazon QuickSight Q

In this post that was published in September 2021, Jeff Barr announced general availability of Amazon QuickSight Q. To recap, Amazon QuickSight Q is a natural language query capability that lets business users ask simple questions of their data. QuickSight Q is powered by machine learning (ML), providing self-service analytics by allowing you to query […]

Architecture diagram.

New for Amazon Redshift – General Availability of Streaming Ingestion for Kinesis Data Streams and Managed Streaming for Apache Kafka

Ten years ago, just a few months after I joined AWS, Amazon Redshift was launched. Over the years, many features have been added to improve performance and make it easier to use. Amazon Redshift now allows you to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. More recently, Amazon Redshift […]

Preview: Amazon Security Lake – A Purpose-Built Customer-Owned Data Lake Service

To identify potential security threats and vulnerabilities, customers should enable logging across their various resources and centralize these logs for easy access and use within analytics tools. Some of these data sources include logs from on-premises infrastructure, firewalls, and endpoint security solutions, and when utilizing the cloud, services such as Amazon Route 53, AWS CloudTrail, […]

New – Amazon Redshift Integration with Apache Spark

Apache Spark is an open-source, distributed processing system commonly used for big data workloads. Spark application developers working in Amazon EMR, Amazon SageMaker, and AWS Glue often use third-party Apache Spark connectors that allow them to read and write the data with Amazon Redshift. These third-party connectors are not regularly maintained, supported, or tested with […]

Preview: Amazon OpenSearch Serverless – Run Search and Analytics Workloads without Managing Clusters

Most AWS analytics services have compelling serverless offerings that make it even easier for customers to analyze vast amounts of data without having to configure, scale, or manage the underlying infrastructure. Along with other serverless analytics, such as Amazon QuickSight for business intelligence and AWS Glue for data integration, we have introduced Amazon EMR Serverless, […]