AWS Big Data Blog
Category: Industries
How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1
This blog post introduces Amazon DataZone and explores how VW used it to build their data mesh to enable streamlined data access across multiple data lakes. It focuses on the key aspect of the solution, which was enabling data providers to automatically publish data assets to Amazon DataZone, which served as the central data mesh for enhanced data discoverability. Additionally, the post provides code to guide you through the implementation.
How Zurich Insurance Group built a log management solution on AWS
This post is written in collaboration with Clarisa Tavolieri, Austin Rappeport and Samantha Gignac from Zurich Insurance Group. The growth in volume and number of logging sources has been increasing exponentially over the last few years, and will continue to increase in the coming years. As a result, customers across all industries are facing multiple […]
Protein similarity search using ProtT5-XL-UniRef50 and Amazon OpenSearch Service
A protein is a sequence of amino acids that, when chained together, creates a 3D structure. This 3D structure allows the protein to bind to other structures within the body and initiate changes. This binding is core to the working of many drugs. A common workflow within drug discovery is searching for similar proteins, because […]
Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics
Healthcare providers have an opportunity to improve the patient experience by collecting and analyzing broader and more diverse datasets. This includes patient medical history, allergies, immunizations, family disease history, and individuals’ lifestyle data such as workout habits. Having access to those datasets and forming a 360-degree view of patients allows healthcare providers such as claim […]
Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success
This post is co-written with Toney Thomas and Ben Vengerovsky from Bluestone. In the ever-evolving world of finance and lending, the need for real-time, reliable, and centralized data has become paramount. Bluestone, a leading financial institution, embarked on a transformative journey to modernize its data infrastructure and transition to a data-driven organization. In this post, […]
Simplify data streaming ingestion for analytics using Amazon MSK and Amazon Redshift
Towards the end of 2022, AWS announced the general availability of real-time streaming ingestion to Amazon Redshift for Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK), eliminating the need to stage streaming data in Amazon Simple Storage Service (Amazon S3) before ingesting it into Amazon Redshift. Streaming ingestion from Amazon […]
Combine transactional, streaming, and third-party data on Amazon Redshift for financial services
Financial services customers are using data from different sources that originate at different frequencies, which includes real time, batch, and archived datasets. Additionally, they need streaming architectures to handle growing trade volumes, market volatility, and regulatory demands. The following are some of the key business use cases that highlight this need: Trade reporting – Since […]
Mastering market dynamics: Transforming transaction cost analytics with ultra-precise Tick History – PCAP and Amazon Athena for Apache Spark
This post is cowritten with Pramod Nayak, LakshmiKanth Mannem and Vivek Aggarwal from the Low Latency Group of LSEG. Transaction cost analysis (TCA) is widely used by traders, portfolio managers, and brokers for pre-trade and post-trade analysis, and helps them measure and optimize transaction costs and the effectiveness of their trading strategies. In this post, […]
How healthcare organizations can analyze and create insights using price transparency data
In recent years, there has been a growing emphasis on price transparency in the healthcare industry. Under the Transparency in Coverage (TCR) rule, hospitals and payors to publish their pricing data in a machine-readable format. With this move, patients can compare prices between different hospitals and make informed healthcare decisions. For more information, refer to […]
Using Experian identity resolution with AWS Clean Rooms to achieve higher audience activation match rates
This is a guest post co-written with Tyler Middleton, Experian Senior Partner Marketing Manager, and Jay Rakhe, Experian Group Product Manager. As the data privacy landscape continues to evolve, companies are increasingly seeking ways to collect and manage data while protecting privacy and intellectual property. First party data is more important than ever for companies […]