Create custom PostgreSQL data types using Trusted Language Extensions
PostgreSQL ships with many robust data types that accommodate most customer workloads in a performant manner. Although PostgreSQL has the capabilities to deploy custom data types natively, introducing new data types at scale in architectures spanning multiple AWS accounts and Regions poses a unique challenge for builders. With Trusted Language Extensions (TLE), you can create and manage your custom data types, allowing the quick and easy deployment of PostgreSQL data types across your infrastructures in a secure and efficient manner.
In this post, we demonstrate how to create custom PostgreSQL data types using TLE.
Trusted Language Extensions (pg_tle) Overview
PostgreSQL is known for its extensibility and support for a large breadth of database extensions. Trusted Language Extension for PostgreSQL (pg_tle) is an open-source framework that allows developers to build and package PostgreSQL extensions, including into Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL. The TLE framework enhances these capabilities by providing a simple and safe way to add database extensions to Amazon Aurora and Amazon RDS, written in trusted languages in a way that can be easily repackaged and deployed as soon as they’re ready. TLE provides a controlled environment for running user-defined database extensions, allowing for quick and iterative expansion of your PostgreSQL database functionality.
PostgreSQL custom data types overview
Applications may demand specialized attributes, validation rules, or behaviors that can’t be accurately expressed with standard PostgreSQL data types. For example, if your application deals with geographic coordinates, sensor readings with complex units, or intricate financial calculations, creating custom data types can help maintain data integrity, enhance query efficiency, and improve overall code clarity.
Custom data types can also contribute to a more organized and semantically meaningful database design. When your data exhibits distinct characteristics that don’t align neatly with the general-purpose data types, custom types offer a way to encapsulate the complexity and provide a more intuitive interface for data manipulation. By creating custom data types, you not only streamline your application’s logic, but also establish a clear representation of your domain’s concepts within the database schema, making the code base more comprehensible for developers and maintainers alike.
Custom data types and TLE in Amazon Aurora or Amazon RDS
Trusted Language Extension’s latest functionality allows you to create new data types, complete with their own rules and properties. Although PostgreSQL contains the capabilities to create custom data types, using TLE allows you to easily wrap custom data types as a TLE extension such that it can be easily redeployed across AWS accounts and Regions. The additional benefit of using TLE to create and manage custom data types lies in the ability to use trusted PostgreSQL languages other than SQL. In this post’s example, we create an unsigned int2 data type (written as a TLE using the PL/pgSQL trusted language). This functionality will be included in pg_tle v1.1.1, and is available on database instances in Amazon RDS running PostgreSQL 15.4 and higher, 14.9 and higher, and 13.12 and higher in all applicable AWS Regions, including the AWS GovCloud (US) Regions. Please see the related What’s new post for more information: Create Custom PostgreSQL Data types using Trusted Language Extensions
Although signed integer data types can support both positive and negative numbers, unsigned integer data types can only store positive numbers (or zero). Using unsigned integer data types can be advantageous, especially in vector workloads where it can be important to reduce storage space. In the following sections, we explore the requirements and show a basic example of how to create an unsigned int2 data type, which contains 16 bits per instance, allowing for values from 0–65,535 for PostgreSQL using TLE.
You must have an Amazon Aurora or Amazon RDS instance with the pg_tle extension installed. For more information about setting up TLE, refer to New – Trusted Language Extensions for PostgreSQL on Amazon Aurora and Amazon RDS.
Create an unsigned int2 data type using TLE
Complete the following steps:
- Connect to a database and check that “pg_tle” is installed. You do this with the following SQL command:
- On Amazon Aurora and Amazon RDS, you need to explicitly grant the
pgtle_adminpermission to your user. If you’re using the postgres user, you can do this with the following command:
- Now you’re ready to create the uint2 data type. You can do this with the following code, which installs a TLE called
If the preceding commands are successful, you’ll see the following output:
- Now, enable your new TLE extension:
If successful, you can observe the following message:
- To verify that the new data type is installed, use the following command to list all user-managed data types:
- Now that you have observed that your custom data type is installed, create a table that utilizes it:
- Insert some data into your test table:
- Select the contents of the table and observe what you inserted:
Let’s see what happens if we try to set invalid data in the uint2 data type. An unsigned int2 data type should only allow values between 0–65535, so attempting negative values or positive values beyond 65535 should fail:
As we can see, values larger then 65535 are disallowed, validating that our
pg_uint2 data type is functioning correctly.
Now that we’ve validated that our new unsigned two-byte integer data type is functioning correctly, we can use the preceding code to redeploy this data type across both managed and unmanaged PostgreSQL instances as a PostgreSQL extension. Note that this is provided as a basic example of how to implement a custom data type using TLE. Additional production hardening and testing with your specific workload is highly recommended. Operators can be used for this data type when authoring TLEs. For details, see Trusted Language Base types.
In this section, we share some best practices when creating custom PostgreSQL data types using TLE.
Check base PostgreSQL data types against your use case to confirm a custom data type needs to be written
PostgreSQL contains (as of version 15) over 43 base data types. These data types should accommodate most workloads, but in some circumstances, you may need additional check or validation constraints that scale more effectively when written into a custom data type for your specific use case. Writing custom data types (via TLE or otherwise) means more aspects of your data model that must be maintained and deployed over the lifetime of your infrastructure, and the technical debt should be understood beforehand.
Carefully evaluate the performance of your TLE custom data type prior to production deployment
Test your custom data types and benchmark them against similar base PostgreSQL data types to ensure your custom data types meet your needs in function and when scaled out to your production use cases.
You can use tools like pgbench to simulate your workload first using default data types available in PostgreSQL. If benchmarking results are not as performant as expected (or default data types complicate writing queries around your data model), then consider authoring a custom data type using TLE.
Write your TLE custom data type to closely match your needs
If the decision is made to write (and maintain) custom data types using TLE, ensure that the scope of any custom data types closely matches your business needs.
Much can be accomplished using the default PostgreSQL data types, but it depends on the needs of a given workload. For example, check constraints can be used to imitate unsigned integer data types in some cases, but redistributing TLE custom data types across multiple clusters can add to the technical overhead of maintaining those resources over time. Additionally, customized data types are one more layer in the schema layer that must be maintained over time, and minimizing of technical debt is always recommended where possible.
Although PostgreSQL contains many data types to accommodate most customer workloads, sometimes it’s necessary to create custom data types to accommodate the specialized semantics of new workloads. Although the ability to create custom data types is available natively in PostgreSQL, TLE enables you to easily redeploy custom data types across your AWS relational database infrastructure, and provides both self-managed and AWS managed services. In this post, we covered the how and why behind custom PostgreSQL data types and provided a tutorial demonstrating the creation of a TLE custom data type for storing uint2 in PostgreSQL. Finally, we covered several best practices around custom TLE data types. Custom data type functionality is included in pg_tle v1.1.1, and is available on database instances in Amazon RDS running PostgreSQL 15.4 and higher, 14.9 and higher, and 13.12 and higher in all applicable AWS Regions, including the AWS GovCloud (US) Regions.
We welcome your comments and feedback in the comments section.
About the Authors
Peter Celentano is a Senor Specialist Solutions Architect with Amazon Web Services, focusing on managed PostgreSQL. He works with AWS customers to design scalable, secure, performant, and robust database architectures on the cloud.
Sukhpreet Kaur Bedi is a Database Specialist Solutions Architect with AWS focusing on Amazon RDS/Aurora PostgreSQL engines. She helps customers innovate on the AWS platform by building highly available, scalable, and secure database architectures.