AWS Big Data Blog
Category: Analytics
Amazon OpenSearch Service: Managed and community driven
Today the Linux Foundation announced the OpenSearch Software Foundation. As part of the creation of the OpenSearch Foundation, AWS has transferred ownership of OpenSearch to the Linux Foundation. At the launch of the project in April of 2021, in introducing OpenSearch, we spoke of our desire to “ensure users continue to have a secure, high-quality, fully open source search and analytics suite with a rich roadmap of new and innovative functionality.” We’ve maintained that desire and commitment, and with this transfer, are deepening that commitment, and bringing in the broader community with open governance to help with that goal.
Use Batch Processing Gateway to automate job management in multi-cluster Amazon EMR on EKS environments
AWS customers often process petabytes of data using Amazon EMR on EKS. In enterprise environments with diverse workloads or varying operational requirements, customers frequently choose a multi-cluster setup due to the following advantages: Better resiliency and no single point of failure – If one cluster fails, other clusters can continue processing critical workloads, maintaining business […]
How CFM built a well-governed and scalable data-engineering platform using Amazon EMR for financial features generation
Capital Fund Management (CFM) is an alternative investment management company based in Paris with staff in New York City and London. CFM takes a scientific approach to finance, using quantitative and systematic techniques to develop the best investment strategies. In this post, we share how we built a well-governed and scalable data engineering platform using Amazon EMR for financial features generation.
Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2
Salesforce and Amazon have collaborated to help customers unlock value from unified data and accelerate time to insights with bidirectional Zero Copy data sharing between Salesforce Data Cloud and Amazon Redshift. In the Part 1 of this series, we discussed how to configure data sharing between Salesforce Data Cloud and customers’ AWS accounts in the same AWS Region. In this post, we discuss the architecture and implementation details of cross-Region data sharing between Salesforce Data Cloud and customers’ AWS accounts.
The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables
The AWS Glue Data Catalog now enhances managed table optimization of Apache Iceberg tables by automatically removing data files that are no longer needed. Along with the Glue Data Catalog’s automated compaction feature, these storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance. Iceberg creates a new version called […]
Differentiate generative AI applications with your data using AWS analytics and managed databases
While the potential of generative artificial intelligence (AI) is increasingly under evaluation, organizations are at different stages in defining their generative AI vision. In many organizations, the focus is on large language models (LLMs), and foundation models (FMs) more broadly. This is just the tip of the iceberg, because what enables you to obtain differential […]
How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune
In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, […]
Developer guidance on how to do local testing with Amazon MSK Serverless
In this post, I present you with guidance on how developers can connect to Amazon MSK Serverless from local environments. The connection is done using an Amazon MSK endpoint through an SSH tunnel and a bastion host. This enables developers to experiment and test locally, without needing to setup a separate Kafka cluster.
How HPE Aruba Supply Chain optimized cost and performance by migrating to an AWS modern data architecture
This post describes how HPE Aruba automated their Supply Chain management pipeline, and re-architected and deployed their data solution by adopting a modern data architecture on AWS.
Migrate Delta tables from Azure Data Lake Storage to Amazon S3 using AWS Glue
Organizations are increasingly using a multi-cloud strategy to run their production workloads. We often see requests from customers who have started their data journey by building data lakes on Microsoft Azure, to extend access to the data to AWS services. Customers want to use a variety of AWS analytics, data, AI, and machine learning (ML) […]