AWS Storage Blog
Streamlining access to tabular datasets stored in Amazon S3 Tables with DuckDB
As businesses continue to rely on data-driven decision-making, there’s an increasing demand for tools that streamline and accelerate the process of data analysis. Efficiency and simplicity in application architecture can serve as a competitive edge when driving high-stakes decisions. Developers are seeking lightweight, flexible tools that seamlessly integrate with their existing application stack, specifically solutions […]
Seamless streaming to Amazon S3 Tables with StreamNative Ursa Engine
Organizations are modernizing data platforms to use generative AI by centralizing data from various sources and streaming real-time data into data lakes. A strong data foundation, such as scalable storage, reliable ingestion pipelines, and interoperable formats, is critical for businesses to discover, explore, and consume data. As organizations modernize their platforms, they often turn to […]
Connect Snowflake to S3 Tables using the SageMaker Lakehouse Iceberg REST endpoint
Organizations today seek data analytics solutions that provide maximum flexibility and accessibility. Customers need their data to be readily available using their preferred query engines, and break down barriers across different computing environments. At the same time, they want a single copy of data to be used across these solutions, to track lineage, be cost […]
Build a managed Apache Iceberg data lake using Starburst and Amazon S3 Tables
Managing large-scale data analytics across diverse data sources has long been a challenge for enterprises. Data teams often struggle with complex data lake configurations, performance bottlenecks, and the need to maintain consistent data governance while enabling broad access to analytics capabilities. Today, Starburst announces a powerful solution to these challenges by extending their Apache Iceberg […]
Build a data lake for streaming data with Amazon S3 Tables and Amazon Data Firehose
Businesses are increasingly adopting real-time data processing to stay ahead of user expectations and market changes. Industries such as retail, finance, manufacturing, and smart cities are using streaming data for everything from optimizing supply chains to detecting fraud and improving urban planning. The ability to use data as it is generated has become a critical […]
Event-driven framework to integrate AWS Backup service with CSPM tools
Many organizations use third-party Cloud Security Posture Management (CSPM) tools (for example Wiz.io) to continuously detect and remediate misconfiguration from build time to runtime across hybrid clouds such as AWS. CSPM tools often use AWS resource tags to enhance their security and compliance monitoring capabilities. Tags are key-value pairs that you can assign to AWS resources […]
Optimizing Amazon FSx for Lustre storage consumption using automatic data tiering with Amazon S3
Managing high-performance file storage can be a significant operational and cost challenge for many organizations, especially those running compute-intensive workloads such as high-performance computing (HPC) or data analytics. This is particularly true for organizations with existing data lakes on Amazon S3 who need POSIX-compliant, high-performance file system access. Amazon FSx for Lustre provides a scalable, […]
Unlock higher performance for file system workloads with scalable metadata performance on Amazon FSx for Lustre
Imagine a company like a movie studio, one that works with enormous volumes of video files, scripts, and animation assets. They store these files on a high-performance file system such as Amazon FSx for Lustre, a fully managed shared storage built on the world’s most popular high-performance file system. Each file has metadata, such as […]
Access data in Amazon S3 Tables using PyIceberg through the AWS Glue Iceberg REST endpoint
Modern data lakes integrate with multiple engines to meet a wide range of analytics needs, from SQL querying to stream processing. A key enabler of this approach is the adoption of Apache Iceberg as the open table format for building transactional data lakes. However, as the Iceberg ecosystem expands, the growing variety of engines and languages has […]
Protect Oracle Databases on Amazon EC2 using NetApp SnapCenter with Amazon FSx for NetApp ONTAP
Oracle databases typically see significant data growth which in turn increases backup, restore and database refresh times. The need to quickly backup, restore, and refresh large-scale databases is important for ensuring data consistency, business continuity, and accelerating testing and development processes. As more businesses migrate their Oracle databases to Amazon Elastic Compute Cloud (EC2) instances, […]