Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Pular para o conteúdo principalAWS Startups

    Startup Data Analytics: Transactional Data Lakes with Apache Iceberg

    Dia:

    segunda-feira, 3 de março de 2025 - terça-feira, 4 de março de 2025

    Hora:

    21:00 - 03:00 GMT

    Tipo:

    PRESENCIAL

    Idioma:

    English

    Nível(is):

    300 - Avançado

    Apache Iceberg is an open table format for storing large data sets on Amazon S3. It is a foundational tool for building transactional data lakes on S3. Iceberg support ACID transactions, compaction, and schema evolution. Query engines such as Athena, Spark, and Trino can query Iceberg tables natively using SQL.

    Apache Iceberg is popular with startups as it takes away the undifferentiated heavy lifting associated with managing data in S3. In particular, it handles partitions, and compaction which is a common pain (and cost) problem. This becomes especially relevant for event stream data (either CDC from databases, or IoT event streams) as it can quickly, and automatically, add records to an Iceberg table while also mitigating the cost impact of ‘lots of small files’ on S3.

    In this Immersion Day we will dive deep on Iceberg, and how it can be used on AWS to scalably, and cost effectively run analytics on your data. This workshop will be a combination of presentations and hands-on workshops.

    This Immersion Day will be useful if you are running analytics workloads on large columnar-style datasets or streaming events for analytics (e.g. IoT).

    Please bring your laptop to participate, and remember to bring photo ID for entry.

    Agenda (subject to change)

    9:00 PM UTC

    Iceberg introduction: transactional data lake concepts and use cases

    9:30 PM UTC

    Workshop: Creating a transactional Iceberg data lake with AWS Glue and Amazon Athena

    10:15 PM UTC

    Transformation with AWS Glue, and Iceberg maintenance

    11:30 PM UTC

    Workshop: Transformation with AWS Glue, and AWS Schema features

    1:00 AM UTC

    CDC/Eventing to Iceberg with Amazon Kinesis Firehose

    1:15 AM UTC

    Workshop: Amazon Kinesis Firehose as a source for Iceberg

    2:00 AM UTC

    Demo and wrap up