AWS Storage Blog
Query Amazon S3 Tables from open source Trino using Apache Iceberg REST endpoint
Organizations are increasingly focused on addressing the growing challenge of managing and analyzing vast data volumes, while making sure that their data teams have timely access to this data to enable rapid insights and decision-making. Data analysts and scientists need self-service analytics capabilities to build and maintain data products, often involving complex transformations and frequent updates. However, this creates significant operational overhead—from managing small files and delete markers, to handling expanding metadata and rising storage costs for historical versions. These challenges can impact query performance and increase infrastructure costs if left unaddressed.
Modern data architectures have evolved to address these challenges through powerful open source technologies such as Trino and Apache Iceberg, combined with managed services such as Amazon S3 Tables. Trino excels at providing distributed query capabilities across diverse data sources, while Apache Iceberg brings robust table format features such as ACID transactions, schema evolution, and time travel. S3 Tables complement this stack by offering automated optimization and maintenance of Iceberg tables, making sure of consistent performance without manual intervention.
In this post, we demonstrate how to integrate Trino with S3 Tables to create a robust analytics platform that combines the best of both worlds. We walk through the set up process of using the S3 Tables table management APIs that are compatible with the Apache Iceberg REST Catalog standard, to demonstrate how to configure the integration with Trino, and enable key benefits such as automated compaction and snapshot management. By the end of this post, we show how to use this powerful combination to build and maintain high-performing data products at scale.
This post provides configuration for S3 Tables Iceberg REST endpoint. You can use also use S3 Tables with AWS Glue Iceberg REST endpoint. For unified data management across all of your tabular data, data governance, and fine-grained access controls, we recommend using Amazon SageMaker Lakehouse, which supports S3 Tables as a federated catalog.
Solution overview
The S3 Tables Iceberg REST endpoint provides a standards-based API that allows Trino to communicate directly with S3 Tables without needing any proprietary connectors or other middleware. This means your Trino deployments can immediately use the S3 Tables built-in optimizations for Iceberg workloads.
This guide shows you how to do the following:
- Deploy a single-node Trino environment using AWS CloudFormation template
- Configure the Iceberg REST connector to communicate with S3 Tables
- Create schemas and tables directly in S3 Tables from Trino
- Perform read and write operations with full Iceberg functionality
- Use advanced features such as time travel and schema evolution
Deployment architecture: Although Trino is typically deployed in a distributed multi-node architecture for production workloads to use its parallel processing capabilities, this post uses a single Amazon Elastic Compute Cloud (Amazon EC2) instance deployment that enable you to quickly explore the integration between S3 Tables and Trino. The Iceberg catalog configurations would be the same when deploying on your Trino cluster for production use cases.
Prerequisites
You need the following prerequisites to complete this solution:
- An AWS account with permissions to create resources such as EC2 instances, AWS Identity and Access Management (IAM) roles, and S3 table buckets
- Amazon Virtual Private Cloud (Amazon VPC) with a public subnet (or private subnet with VPN access)
- An Amazon EC2 key pair for SSH access
- AWS Command Line Interface (AWS CLI) installed and configured (optional, for manual S3 table bucket creation)
Walkthrough
The following steps walk you through this solution.
Part A: Deploying Trino with S3 Tables integration
This section provides step-by-step instructions for launching the Trino environment using the provided CloudFormation template. It also describes the resources needed to build this solution. To streamline the integration between Trino and S3 Tables, we’ve created a CloudFormation template that automates the entire deployment process.
1. CloudFormation template overview
The CloudFormation template provisions all the necessary components for a complete Trino environment that can read and write data to S3 Tables through the Iceberg REST endpoint:
- S3 table bucket: Creates a dedicated bucket for storing your data
- EC2 instance: Deploys a single-node Trino server with the version 475
- Security group: Configures network access for SSH and Trino web UI
- IAM instance profile: Provides the EC2 instance with appropriate permissions
2. Deployment process
- Download the CloudFormation template.
- Navigate to the CloudFormation console and choose Create stack > With new resources (standard), as shown in the following figure.
- Upload the template file and choose Next.
- Provide values for the template parameters:
- Stack name: A name for your CloudFormation stack
- KeyName: An existing EC2 key pair for SSH access
- VpcId: The VPC for deploying the EC2 instance
- SubnetId: A public subnet in your VPC
- S3TablesBucketName: Name for your S3 table bucket
- AwsRegion: An AWS Region for S3 Tables and Glue services
- TrinoInstanceType: EC2 instance size (default: t3.xlarge)
- Choose Next on the Configure stack options page.
- Review the details, acknowledge that CloudFormation might create IAM resources, and choose Create stack.
- Wait for the stack creation to complete (approximately 10-15 minutes).
When the stack is created, you find useful information in the Outputs tab:
- PublicDNS: The public DNS name of your Trino instance
- SSHCommand: Command to SSH into the instance
- TrinoURL: URL to access the Trino web UI
- TableBucketName: Name of your S3 table bucket
3. Understanding the deployment
The CloudFormation template performs several key tasks:
- Infrastructure provisioning: Sets up the EC2 instance, security group, and S3 table bucket
- Software installation: Installs Amazon Corretto Java 23 and Trino 475
- Configuration: Creates necessary Trino configuration files
- Integration configuration: Sets up the Iceberg REST connector for S3 Tables
Part B: Connecting Trino to Amazon S3 Tables with Iceberg REST endpoint
The CloudFormation template automatically configures the S3 Tables catalog in Trino. In the next section we examine the configuration that enables this integration.
1. Catalog configuration details
A catalog in Trino is the configuration that enables access to a specific data source. Each Trino cluster can have multiple catalogs configured, allowing access to different data sources simultaneously.
As part of this setup, the CloudFormation template creates a catalog properties file at /home/ec2-user/trino-server-475/etc/catalog/s3tables_irc.properties
with the following configuration:
connector.name=iceberg iceberg.catalog.type=rest iceberg.rest-catalog.uri=https://s3tables.${AwsRegion}.amazonaws.com/iceberg iceberg.rest-catalog.warehouse=arn:aws:s3tables:${AwsRegion}:${AWS::AccountId}:bucket/${S3TablesBucketName} iceberg.rest-catalog.sigv4-enabled=true iceberg.rest-catalog.signing-name=s3tables iceberg.rest-catalog.view-endpoints-enabled=false fs.hadoop.enabled=false fs.native-s3.enabled=true s3.iam-role=<ARN of the IAM ROLE with permissions to S3 Tables> s3.region=${AwsRegion}
2. S3 Tables Iceberg REST endpoint configuration properties
The following table lists the key properties in the catalog configuration on Trino:
Property name | Description |
iceberg.rest-catalog.uri | REST server API endpoint URI (necessary) Value – https://s3tables.${AwsRegion}.amazonaws.com/iceberg |
iceberg.rest-catalog.warehouse | Warehouse identifier/location for the catalog (necessary). For S3 Tables, this is the ARN for the S3 table bucket as shown in the properties example above. |
iceberg.rest-catalog.sigv4-enabled | Must be set to ‘true’ (necessary) |
iceberg.rest-catalog.signing-name | Must be set to ‘s3tables’ (necessary) |
iceberg.rest-catalog.view-endpoints-enabled | Must be set to ‘false’ (necessary) |
fs.hadoop.enabled | Must be set to ‘false’ |
fs.native-s3.enabled | Must be set to ‘true’ |
s3.iam-role | Amazon Resource Name (ARN) of the IAM Role with permissions to S3 Tables. In this post, we are using the same role attached to the EC2 instance. |
s3.region | AWS Region, for example us-east-1 |
This configuration establishes a connection between Trino and the S3 Tables REST endpoint. You can have multiple catalogs registered, one per S3 table bucket which is determined by the iceberg.rest-catalog.warehouse
property.
3. Working with S3 Tables in Trino
Now that you have Trino set up and configured to work with S3 Tables, you can explore how to work with this integration.
3.1. Connecting to Trino
Add your desired IP/subnet source into the inbound rule of the Trino security group for SSH access (port 22). Connect to the EC2 instance using the SSH command provided in the CloudFormation outputs:
ssh -i your-key.pem ec2-user@your-instance-public-dns
When you’re connected, you can use the Trino CLI which was automatically installed by the CloudFormation template:
cd /home/ec2-user ./trino-cli --catalog s3tables_irc
This connects you to the Trino server using the S3 Tables integration you configured.
3.2. Examples: Creating and querying tables
In this section you run through some example queries to demonstrate the functionality.
3.2.1. Creating a namespace
First, you create a namespace (schema) in S3 Tables. A namespace in S3 tables is a logical container or organizational unit that helps group related tables and objects together.
3.2.2. Creating a table
Create a table with various data types. You don’t need to specify the Table type as Iceberg explicitly as you are connecting to the Iceberg catalog. You can use all standard Iceberg capabilities, such as partitioning, sorting etc. Furthermore, some of the important Iceberg table properties supporting table maintenance operations are configured with default values. You also have the option to edit the configurations using S3 Tables maintenance APIs.
3.3.3. Inserting data
You can insert some sample data into our table. You can also use an existing table in any of the catalogs configured in Trino to read data and write into the S3 Table with a INSERT INTO .. SELECT
statement.
3.3.4. Querying data
You can query the data you just inserted:
Create Table As Select (CTAS): Trino also supports creating tables from the results of a query. In this case, you are creating a copy of the table previously created to demonstrate this.
Altering tables: One of the benefits of using Iceberg is schema evolution. You can add a new column, and update the values for the column introduced. Each transaction from Trino, such as ALTER DDL
, creates a new snapshot of the Iceberg table.
3.4. Advanced Iceberg features
Trino provides a view of the Iceberg metadata that can be used to view the information of the objects stored in S3 Tables. This provides an easy way to understand how the data is organized, and relevant information associated with each transaction performed on the table.
3.4.1. Querying table metadata
3.4.2. Time travel
Iceberg time travel is a powerful feature of the Apache Iceberg table format that allows users to query data as it existed at a specific point in time. This capability is particularly useful for data analysis, auditing, and reproducing historical results. Trino supports Time Travel using FOR VERSION AS OF query syntax where you can either provide a Snapshot ID or use a TIMESTAMP, as show in the following example:
3.4.3. Viewing history and rolling back
Apache Iceberg’s history and rollback features provide robust data versioning capabilities. Users can view complete table operation history and easily revert to previous states using timestamp or snapshot IDs, making sure of data recovery and maintaining audit compliance in data lakes.
Now for the best part, we ran a few transactions with the example queries above and noticed how Iceberg creates snapshots for each transaction (or table operation). You might have use cases where you might be updating the data quite regularly, or scenarios where small chunks of data are being ingested into the table every few seconds or minutes. These operations typically lead to creation of small files which can have performance implications, duplication of data from multiple snapshots, and expired data (previous versions of the records that were updated or deleted).
S3 Tables offers maintenance operations such as compaction, snapshot management, and unreferenced file removal to make sure that the table remains optimized and also to lower the costs of storage by removing files that are not needed anymore. These options are enabled by default for all tables with preconfigured properties, which you can also modify at an individual table level based on your specific requirements. You can learn more about S3 Tables maintenance in the documentation.
Cleaning up
To clean up the resources, complete the following steps:
- Delete the schema (delete it from Trino CLI to avoid using AWS CLI).
- In your IAM console, navigate to CloudFormation and delete the stack you created.
Conclusion
In this post, we’ve demonstrated the seamless integration between Trino and Amazon S3 Tables using the Iceberg REST endpoint. This powerful combination allows you to benefit from Trino’s distributed query capabilities for interactive analytics over data stored in S3 Tables, Iceberg’s advanced features such as transactional support, schema evolution, and time travel, while using the S3 Tables built-in Iceberg optimizations. The integration provides a flexible, high-performance solution for modern data analytics on AWS. Automating the deployment with CloudFormation and using the standardized Iceberg REST interface, you can quickly set up this integration and start gaining insights from your data. Whether you’re building a new data platform or enhancing an existing one, the combination of Trino and S3 Tables offers a robust foundation for your analytical workloads.
To learn more about S3 Tables, Trino and Iceberg, visit the S3 User Guide, Trino, and Apache Iceberg documentation.
Check out more posts about S3 Tables on the AWS Storage Blog.