Query Amazon S3 Tables from open source Trino using Apache Iceberg REST endpoint

Organizations are increasingly focused on addressing the growing challenge of managing and analyzing vast data volumes, while making sure that their data teams have timely access to this data to enable rapid insights and decision-making. Data analysts and scientists need self-service analytics capabilities to build and maintain data products, often involving complex transformations and frequent updates. However, this creates significant operational overhead—from managing small files and delete markers, to handling expanding metadata and rising storage costs for historical versions. These challenges can impact query performance and increase infrastructure costs if left unaddressed.

Modern data architectures have evolved to address these challenges through powerful open source technologies such as Trino and Apache Iceberg, combined with managed services such as Amazon S3 Tables. Trino excels at providing distributed query capabilities across diverse data sources, while Apache Iceberg brings robust table format features such as ACID transactions, schema evolution, and time travel. S3 Tables complement this stack by offering automated optimization and maintenance of Iceberg tables, making sure of consistent performance without manual intervention.

In this post, we demonstrate how to integrate Trino with S3 Tables to create a robust analytics platform that combines the best of both worlds. We walk through the set up process of using the S3 Tables table management APIs that are compatible with the Apache Iceberg REST Catalog standard, to demonstrate how to configure the integration with Trino, and enable key benefits such as automated compaction and snapshot management. By the end of this post, we show how to use this powerful combination to build and maintain high-performing data products at scale.

This post provides configuration for S3 Tables Iceberg REST endpoint. You can use also use S3 Tables with AWS Glue Iceberg REST endpoint. For unified data management across all of your tabular data, data governance, and fine-grained access controls, we recommend using Amazon SageMaker Lakehouse, which supports S3 Tables as a federated catalog.

Solution overview

The S3 Tables Iceberg REST endpoint provides a standards-based API that allows Trino to communicate directly with S3 Tables without needing any proprietary connectors or other middleware. This means your Trino deployments can immediately use the S3 Tables built-in optimizations for Iceberg workloads.

This guide shows you how to do the following:

Deploy a single-node Trino environment using AWS CloudFormation template
Configure the Iceberg REST connector to communicate with S3 Tables
Create schemas and tables directly in S3 Tables from Trino
Perform read and write operations with full Iceberg functionality
Use advanced features such as time travel and schema evolution

Deployment architecture: Although Trino is typically deployed in a distributed multi-node architecture for production workloads to use its parallel processing capabilities, this post uses a single Amazon Elastic Compute Cloud (Amazon EC2) instance deployment that enable you to quickly explore the integration between S3 Tables and Trino. The Iceberg catalog configurations would be the same when deploying on your Trino cluster for production use cases.

Prerequisites

You need the following prerequisites to complete this solution:

An AWS account with permissions to create resources such as EC2 instances, AWS Identity and Access Management (IAM) roles, and S3 table buckets
Amazon Virtual Private Cloud (Amazon VPC) with a public subnet (or private subnet with VPN access)
An Amazon EC2 key pair for SSH access
AWS Command Line Interface (AWS CLI) installed and configured (optional, for manual S3 table bucket creation)

Walkthrough

The following steps walk you through this solution.

Part A: Deploying Trino with S3 Tables integration

This section provides step-by-step instructions for launching the Trino environment using the provided CloudFormation template. It also describes the resources needed to build this solution. To streamline the integration between Trino and S3 Tables, we’ve created a CloudFormation template that automates the entire deployment process.

1. CloudFormation template overview

The CloudFormation template provisions all the necessary components for a complete Trino environment that can read and write data to S3 Tables through the Iceberg REST endpoint:

S3 table bucket: Creates a dedicated bucket for storing your data
EC2 instance: Deploys a single-node Trino server with the version 475
Security group: Configures network access for SSH and Trino web UI
IAM instance profile: Provides the EC2 instance with appropriate permissions

2. Deployment process

Download the CloudFormation template.
Navigate to the CloudFormation console and choose Create stack > With new resources (standard), as shown in the following figure.
Upload the template file and choose Next.
Provide values for the template parameters:
1. Stack name: A name for your CloudFormation stack
2. KeyName: An existing EC2 key pair for SSH access
3. VpcId: The VPC for deploying the EC2 instance
4. SubnetId: A public subnet in your VPC
5. S3TablesBucketName: Name for your S3 table bucket
6. AwsRegion: An AWS Region for S3 Tables and Glue services
7. TrinoInstanceType: EC2 instance size (default: t3.xlarge)
Choose Next on the Configure stack options page.
Review the details, acknowledge that CloudFormation might create IAM resources, and choose Create stack.
Wait for the stack creation to complete (approximately 10-15 minutes).

When the stack is created, you find useful information in the Outputs tab:

PublicDNS: The public DNS name of your Trino instance
SSHCommand: Command to SSH into the instance
TrinoURL: URL to access the Trino web UI
TableBucketName: Name of your S3 table bucket

3. Understanding the deployment

The CloudFormation template performs several key tasks:

Infrastructure provisioning: Sets up the EC2 instance, security group, and S3 table bucket
Software installation: Installs Amazon Corretto Java 23 and Trino 475
Configuration: Creates necessary Trino configuration files
Integration configuration: Sets up the Iceberg REST connector for S3 Tables

Part B: Connecting Trino to Amazon S3 Tables with Iceberg REST endpoint

The CloudFormation template automatically configures the S3 Tables catalog in Trino. In the next section we examine the configuration that enables this integration.

1. Catalog configuration details

A catalog in Trino is the configuration that enables access to a specific data source. Each Trino cluster can have multiple catalogs configured, allowing access to different data sources simultaneously.

As part of this setup, the CloudFormation template creates a catalog properties file at /home/ec2-user/trino-server-475/etc/catalog/s3tables_irc.properties with the following configuration:

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=https://s3tables.${AwsRegion}.amazonaws.com/iceberg
iceberg.rest-catalog.warehouse=arn:aws:s3tables:${AwsRegion}:${AWS::AccountId}:bucket/${S3TablesBucketName}
iceberg.rest-catalog.sigv4-enabled=true
iceberg.rest-catalog.signing-name=s3tables
iceberg.rest-catalog.view-endpoints-enabled=false
fs.hadoop.enabled=false
fs.native-s3.enabled=true
s3.iam-role=<ARN of the IAM ROLE with permissions to S3 Tables>
s3.region=${AwsRegion}

2. S3 Tables Iceberg REST endpoint configuration properties

The following table lists the key properties in the catalog configuration on Trino:

Property name	Description
iceberg.rest-catalog.uri	REST server API endpoint URI (necessary) Value – https://s3tables.${AwsRegion}.amazonaws.com/iceberg
iceberg.rest-catalog.warehouse	Warehouse identifier/location for the catalog (necessary). For S3 Tables, this is the ARN for the S3 table bucket as shown in the properties example above.
iceberg.rest-catalog.sigv4-enabled	Must be set to ‘true’ (necessary)
iceberg.rest-catalog.signing-name	Must be set to ‘s3tables’ (necessary)
iceberg.rest-catalog.view-endpoints-enabled	Must be set to ‘false’ (necessary)
fs.hadoop.enabled	Must be set to ‘false’
fs.native-s3.enabled	Must be set to ‘true’
s3.iam-role	Amazon Resource Name (ARN) of the IAM Role with permissions to S3 Tables. In this post, we are using the same role attached to the EC2 instance.
s3.region	AWS Region, for example us-east-1

This configuration establishes a connection between Trino and the S3 Tables REST endpoint. You can have multiple catalogs registered, one per S3 table bucket which is determined by the iceberg.rest-catalog.warehouse property.

3. Working with S3 Tables in Trino

Now that you have Trino set up and configured to work with S3 Tables, you can explore how to work with this integration.

3.1. Connecting to Trino

Add your desired IP/subnet source into the inbound rule of the Trino security group for SSH access (port 22). Connect to the EC2 instance using the SSH command provided in the CloudFormation outputs:

ssh -i your-key.pem ec2-user@your-instance-public-dns

When you’re connected, you can use the Trino CLI which was automatically installed by the CloudFormation template:

cd /home/ec2-user
./trino-cli --catalog s3tables_irc

This connects you to the Trino server using the S3 Tables integration you configured.

3.2. Examples: Creating and querying tables

In this section you run through some example queries to demonstrate the functionality.

3.2.1. Creating a namespace

First, you create a namespace (schema) in S3 Tables. A namespace in S3 tables is a logical container or organizational unit that helps group related tables and objects together.

CREATE SCHEMA blog_namespace;

USE blog_namespace;

3.2.2. Creating a table

Create a table with various data types. You don’t need to specify the Table type as Iceberg explicitly as you are connecting to the Iceberg catalog. You can use all standard Iceberg capabilities, such as partitioning, sorting etc. Furthermore, some of the important Iceberg table properties supporting table maintenance operations are configured with default values. You also have the option to edit the configurations using S3 Tables maintenance APIs.

CREATE TABLE IF NOT EXISTS customers (
    customer_sk INT, 
    customer_id VARCHAR, 
    salutation VARCHAR, 
    first_name VARCHAR, 
    last_name VARCHAR, 
    preferred_cust_flag VARCHAR, 
    birth_day INT, 
    birth_month INT, 
    birth_year INT, 
    birth_country VARCHAR, 
    login VARCHAR
) WITH (
    format = 'PARQUET',
    sorted_by = ARRAY['customer_id']
);

SHOW TABLES;

DESCRIBE customers;

3.3.3. Inserting data

You can insert some sample data into our table. You can also use an existing table in any of the catalogs configured in Trino to read data and write into the S3 Table with a INSERT INTO .. SELECT statement.

INSERT INTO customers
VALUES 
    (1, 'AAAAA', 'Mrs', 'Amanda',     'Olson',  'Y', 8,  4, 1984, 'US', 'aolson' ),
    (2, 'AAAAB', 'Mr',  'Leonard',    'Eads',   'N', 22, 6, 2001, 'US', 'leads' ),
    (3, 'BAAAA', 'Mr',  'David',      'White',  'Y', 16, 2, 1999, 'US', 'dwhite' ),
    (4, 'BBAAA', 'Mr',  'Melvin',     'Lee',    'N', 30, 3, 1973, 'US', 'mlee' ),
    (5, 'AACAA', 'Mr',  'Donald',     'Holt',   'N', 2,  6, 1982, 'CA', 'dholt'),
    (6, 'ABAAA', 'Mrs', 'Jacqueline', 'Harvey', 'N', 5, 12, 1988, 'US', 'jharvey'),
    (7, 'BBAAA', 'Ms',  'Debbie',     'Ward',   'N', 6,  1, 2006, 'MX', 'dward'),
    (8, 'ACAAA', 'Mr',  'Tim',        'Strong', 'N', 15, 7, 1976, 'US', 'tstrong')
;

3.3.4. Querying data

You can query the data you just inserted:

SELECT * FROM customers LIMIT 10;

Create Table As Select (CTAS): Trino also supports creating tables from the results of a query. In this case, you are creating a copy of the table previously created to demonstrate this.

CREATE TABLE trino_customers
WITH (
    format = 'PARQUET'
   )
AS SELECT * FROM customers;

SELECT * FROM trino_customers LIMIT 10;

Altering tables: One of the benefits of using Iceberg is schema evolution. You can add a new column, and update the values for the column introduced. Each transaction from Trino, such as ALTER DDL, creates a new snapshot of the Iceberg table.

ALTER TABLE trino_customers ADD COLUMN updated_at TIMESTAMP;

DESCRIBE trino_customers;

UPDATE trino_customers SET updated_at = current_timestamp;

SELECT * FROM trino_customers LIMIT 10;

3.4. Advanced Iceberg features

Trino provides a view of the Iceberg metadata that can be used to view the information of the objects stored in S3 Tables. This provides an easy way to understand how the data is organized, and relevant information associated with each transaction performed on the table.

3.4.1. Querying table metadata

-- To view snapshot information
SELECT * FROM "trino_customers$snapshots";

-- To view file information
SELECT * FROM "trino_customers$files";

3.4.2. Time travel

Iceberg time travel is a powerful feature of the Apache Iceberg table format that allows users to query data as it existed at a specific point in time. This capability is particularly useful for data analysis, auditing, and reproducing historical results. Trino supports Time Travel using FOR VERSION AS OF query syntax where you can either provide a Snapshot ID or use a TIMESTAMP, as show in the following example:

-- Time travel by snapshot ID (replace with an actual snapshot ID from the above snapshots query)
SELECT * FROM trino_customers FOR VERSION AS OF <snapshot ID>;

-- Time travel by timestamp
SELECT * FROM trino_customers FOR TIMESTAMP AS OF TIMESTAMP '2025-03-13 08:00:00.000 UTC';

3.4.3. Viewing history and rolling back

Apache Iceberg’s history and rollback features provide robust data versioning capabilities. Users can view complete table operation history and easily revert to previous states using timestamp or snapshot IDs, making sure of data recovery and maintaining audit compliance in data lakes.

-- Delete a record
DELETE FROM trino_customers WHERE customer_sk = 8;

-- View table history
SELECT * FROM "trino_customers$history";

-- Rollback to a previous snapshot (replace with an actual snapshot ID before the delete)
ALTER TABLE trino_customers EXECUTE rollback_to_snapshot(<snapshot ID>);

Now for the best part, we ran a few transactions with the example queries above and noticed how Iceberg creates snapshots for each transaction (or table operation). You might have use cases where you might be updating the data quite regularly, or scenarios where small chunks of data are being ingested into the table every few seconds or minutes. These operations typically lead to creation of small files which can have performance implications, duplication of data from multiple snapshots, and expired data (previous versions of the records that were updated or deleted).

S3 Tables offers maintenance operations such as compaction, snapshot management, and unreferenced file removal to make sure that the table remains optimized and also to lower the costs of storage by removing files that are not needed anymore. These options are enabled by default for all tables with preconfigured properties, which you can also modify at an individual table level based on your specific requirements. You can learn more about S3 Tables maintenance in the documentation.

Cleaning up

To clean up the resources, complete the following steps:

Delete the schema (delete it from Trino CLI to avoid using AWS CLI).
In your IAM console, navigate to CloudFormation and delete the stack you created.

Conclusion

In this post, we’ve demonstrated the seamless integration between Trino and Amazon S3 Tables using the Iceberg REST endpoint. This powerful combination allows you to benefit from Trino’s distributed query capabilities for interactive analytics over data stored in S3 Tables, Iceberg’s advanced features such as transactional support, schema evolution, and time travel, while using the S3 Tables built-in Iceberg optimizations. The integration provides a flexible, high-performance solution for modern data analytics on AWS. Automating the deployment with CloudFormation and using the standardized Iceberg REST interface, you can quickly set up this integration and start gaining insights from your data. Whether you’re building a new data platform or enhancing an existing one, the combination of Trino and S3 Tables offers a robust foundation for your analytical workloads.

To learn more about S3 Tables, Trino and Iceberg, visit the S3 User Guide, Trino, and Apache Iceberg documentation.

Check out more posts about S3 Tables on the AWS Storage Blog.

Select your cookie preferences

AWS Storage Blog