AWS Big Data Blog
Category: Analytics
Best practices for running production workloads using Amazon MSK tiered storage
In the second post of the series, we discussed some core concepts of the Amazon Managed Streaming for Apache Kafka (Amazon MSK) tiered storage feature and explained how read and write operations work in a tiered storage enabled cluster. This post focuses on how to properly size your MSK tiered storage cluster, which metrics to […]
How Klarna Bank AB built real-time decision-making with Amazon Kinesis Data Analytics for Apache Flink
August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. This is a joint post co-authored with Nir Tsruya from Klarna Bank AB. Klarna is a leading global payments and shopping service, providing smarter and more flexible […]
Federate Amazon QuickSight access with open-source identity provider Keycloak
Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML) powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Organizations are working toward centralizing their identity and access strategy across all their applications, including on-premises and third-party. Many organizations use Keycloak as their identity provider […]
Improve table readability and identify outliers with data bars in Amazon QuickSight
Amazon QuickSight is a scalable, serverless, machine learning (ML)-powered business intelligence (BI) solution that makes it simple to connect to your data, create interactive dashboards, get access to ML-enabled insights, enable natural language querying of your data, and share visuals and dashboards with tens of thousands of internal and external users, either within QuickSight itself […]
Joulica unifies real-time and historical customer experience analytics with Amazon QuickSight
This is a guest post by Tony McCormack from Joulica. Joulica is an Ireland-based startup in the contact center industry. Our founders previously led contact center research and development for a global contact center technology provider, and we founded Joulica because we saw that the shift to the cloud and growing demand for data and […]
AWS Glue streaming application to process Amazon MSK data using AWS Glue Schema Registry
Organizations across the world are increasingly relying on streaming data, and there is a growing need for real-time data analytics, considering the growing velocity and volume of data being collected. This data can come from a diverse range of sources, including Internet of Things (IoT) devices, user applications, and logging and telemetry information from applications, […]
Cost monitoring for Amazon EMR on Amazon EKS
Amazon EMR is the industry-leading cloud big data solution, providing a collection of open-source frameworks such as Spark, Hive, Hudi, and Presto, fully managed and with per-second billing. Amazon EMR on Amazon EKS is a deployment option allowing you to deploy Amazon EMR on the same Amazon Elastic Kubernetes Service (Amazon EKS) clusters that is […]
Choosing an open table format for your transactional data lake on AWS
August 2023: This post was updated to include Apache Iceberg support in Amazon Redshift. Disclaimer: Due to rapid advancements in AWS service support for open table formats, recent developments might not yet be reflected in this post. For the latest information on AWS service support for open table formats, refer to the official AWS service […]
Implement alerts in Amazon OpenSearch Service with PagerDuty
In today’s fast-paced digital world, businesses rely heavily on their data to make informed decisions. This data is often stored and analyzed using various tools, such as Amazon OpenSearch Service, a powerful search and analytics service offered by AWS. OpenSearch Service provides real-time insights into your data to support use cases like interactive log analytics, […]
Automate and accelerate your Amazon QuickSight asset deployments using the new APIs
Business intelligence (BI) and IT operations (BIOps) teams often need to automate and accelerate the deployment of BI assets to ensure business continuity. We heard that you wanted an automated and scalable way to deploy, back up, and replicate Amazon QuickSight assets at scale so that BIOps teams within your organization can work in an […]