AWS Database Blog
PostgreSQL as a JSON database: Advanced patterns and best practices
Modern applications need adaptable data models. For example, e-commerce product catalogs have diverse attributes and constantly changing requirements. How do you design a database that handles this without painful migrations?
JSON adapts to your data’s shape, alleviating the need for rigid schemas. As your application grows, your data model seamlessly evolves alongside it.
PostgreSQL provides JSON capabilities with ACID compliance and advanced indexing, making it a compelling alternative to dedicated NoSQL solutions for JSON workloads. When deployed on AWS through fully managed services like Amazon Relational Database (Amazon RDS) for PostgreSQL, Amazon Aurora PostgreSQL-Compatible Edition, it provides enterprise-grade capabilities, seamless integration with other AWS services, enhancing its functionality.
JSON capabilities of PostgreSQL
PostgreSQL provides robust JSON support through two primary data types:
- JSON – The JSON data type stores an exact copy of input text with parsing on each operation, lacks indexing and requires reparsing each time accessed.
- JSONB – The JSONB data type stores data in an optimized binary format for more efficient processing but is marginally slower with insertion operations. JSONB provides extensive functionality.
The following table compares the two data types.
| Feature | JSON | JSONB |
| Storage Format | Text-based, preserves whitespace and key order | Binary format, optimized for processing; it is larger on disk than JSON |
| Insertion Speed | Faster (no conversion overhead) | Slightly slower (requires binary conversion) |
| Query Performance | Slower (requires parsing each time) | Significantly faster for most operations |
| Indexing Support | You can add functional indexes | Supports GIN indexing for efficient searches |
| Duplicate Keys | Preserves duplicate keys | Removes duplicate keys (keeps last value) |
| Memory Usage | Generally higher for complex operations | More efficient memory utilization |
| Nested Queries | Less efficient for deep nesting | Optimized for nested structure queries |
| Use Case | When input preservation is critical | Most production applications needing performance |
Real-world use cases
PostgreSQL is valuable for systems requiring a transactional, object-relational database, offering extended and robust data types, rich SQL interface, and advanced search capabilities. The following are example use cases:
- Web and mobile applications – PostgreSQL is a popular choice for the backend of dynamic websites and mobile applications that handle high traffic and require strong transactional integrity due to its scalability and reliability.
- Content management systems (CMS) – Many CMS platforms use PostgreSQL for its ability to handle complex content structures and large databases.
- Scientific research and data analysis – PostgreSQL is suitable for scientific research and data analysis due to its support for complex data types and advanced analytics.
- Geospatial applications (GIS) – With the PostGIS extension, it is a powerful geospatial database for applications like mapping and location services.
- Applications requiring advanced search features – PostgreSQL excels for applications demanding sophisticated search capabilities, including full-text search, geospatial search, and semantic search. Examples are job search engines, content recommendation, and product discovery systems.
To explore PostgreSQL’s JSON capabilities for a CMS application, this discussion will exclude its vector search and geospatial search functionalities. This is because dedicated relational databases or optimized database services, such as Amazon OpenSearch Service, provide a more efficient platform for these specialized search techniques.
Solution overview
Let’s build a digital news magazine CMS application. This is a perfect use case for PostgreSQL’s JSON capabilities because news articles have both structured components (titles, publication dates) and unstructured, variable components (different types of metadata depending on the article type).
Prerequisites
To try our examples, you need an AWS account, PostgreSQL 17+, and knowledge of SQL query language. See this link on how to create an Aurora database cluster.
Configure data
We start by setting up our database structure. The following table definition demonstrates how PostgreSQL can store both structured and semi-structured data:
PostgreSQL can combine the results from multiple independent index scans (including GIN indexes) by creating bitmaps of potential row locations.
In this example, the metadata field is of type JSONB. This is helpful for performance in a CMS where editors frequently search and filter articles by various metadata attributes. The GIN index we created speeds up queries that filter JSON data by tags, which is a common operation in a news platform.
Next, we insert some sample articles that might appear in our digital magazine:
In this example, if we were using a purely relational approach, we would need separate tables for authors, categories, tags, and metrics, with foreign keys linking everything together. With JSON, we keep related data together, making it straightforward to retrieve complete article information in a single query. Furthermore, if we later decide to add new metadata fields (like “video_duration” for video articles or “interview_subject” for interviews), we can do so without altering the database schema. This flexibility is invaluable for a CMS that needs to adapt to evolving content types.
Query patterns and optimization
In this section, we explore the search capabilities that PostgreSQL provides for querying JSON data.
Query with JSON path operators
In our CMS, editors might need to find all environment-related articles that mention “climate change” to create a special feature. This query combines traditional SQL with JSON path operators:
In this query:
metadata->'categories' ? 'environment'uses the containment operator “?” to check if the categories array contains “environment“metadata->'tags' ? 'climate'similarly checks if “climate change” is in the tags arraymetadata->>'author'extracts the author field as text (note ->>), becauseauthoris itself a JSON object, we get the entire object as a string
Transforming JSON into tabular data
CMS applications often need to extract structured data from the JSON metadata for analytics purposes. The JSON_TABLE is an ANSI spec compliant function to present JSON data in a tabular format:
This query uses the JSON_TABLE function with PATH and NESTED PATH expressions to flatten JSON data, providing a relational view of our JSON data that we can use for reporting or exporting data.
If using older PostgreSQL versions, we can achieve similar results using lateral joins:
Full-text search with JSON data
CMS applications require advanced search techniques to search their JSON data. In this section, we show how our CMS application can use full-text search capabilities of PostgreSQL.
For our CMS application, readers need to search article content efficiently. PostgreSQL’s full-text search capabilities work seamlessly with JSON content. For example, see the following query:
This query showcases several complex search features:
- Highlighting search terms in results
- Ranking articles by relevance
- Filtering by metadata attributes
- Combining full-text search with structured data conditions
The query uses the following components:
- Full-text search operators – The
@@operator checks if atsvectormatches a tsquery - Text search functions – It uses the following functions:
to_tsvector()converts text to a searchable vector, handling stemming and stop wordsto_tsquery()parses a search query with Boolean operatorsts_headline()generates highlighted excerpts with matching terms in bold- ts_rank() ranks results by relevance
- JSON containment – The
@>operator checks if the left JSON contains the right JSON structure
More GIN Indexing Options
Our GIN examples use jsonb_path_ops, which works fine for presence checks like ?, but we should emphasize that it doesn’t support full-path queries or composite keys. If you intend to use deeper JSON navigation, use jsonb_ops as an alternative.
The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, and the latter creates index items only for each value in the data. As a rule, use each index type in the following scenarios:
- Use
jsonb_path_opsfor simple containment queries (smaller, faster)
- Use
jsonb_opsfor complex path queries and composite operations
For more details on jsonb_path_ops and jsonb_ops, refer to JSON Types.
Now you can have SQL statements like the following:
Optimizing GIN indexes
In addition to the rule for using the right GIN operator class that we mentioned in the previous section, consider the general GIN index optimization strategies:
- Use expression and partial indexes
- If you frequently query a specific key, create an expression index on just that key. This uses a standard B-tree index, which is much faster for simple equality checks.
- Combine an expression index with a
WHEREclause to index only a subset of your data. This is useful for large tables where only a fraction of rows needs to be indexed.
- Optimize bulk data loads – For large data loads, it’s more efficient to load the data first and then create the index afterward. Creating the index in a single bulk operation is often faster than building it incrementally with each
INSERT. - Optimize
gin_pending_list_limit– This configuration parameter controls the size of a GIN index’s “pending list“. It is crucial for tables with high write traffic. Reduce the size of this parameter, to prioritize fast, frequent writes. - Monitor and tune
Semantic and geospatial search capabiltities
When your application requires these two search techniques with your JSON data, alongside PostgreSQL’s strengths, AWS recommends complementary architectural patterns. Common approaches include integrating PostgreSQL with Amazon OpenSearch Service, creating purpose-built search solutions that use each service’s strengths. The PostgreSQL’s integration with OpenSearch, implemented using a pull or push approach and an OpenSearch Integration Pipeline, offers a fully managed, no-code experience for ingesting data into OpenSearch.
The use of separate, optimized services like Amazon OpenSearch Service for vector and geospatial search allows PostgreSQL to be more efficient for tasks where it excels, such as complex queries and transactional integrity. Amazon OpenSearch Service is designed to handle high-dimensional vector embeddings and complex geospatial queries, making it a more performant choice for those specific workloads.
Clean up
When you’re done experimenting with this solution, cleanup the resources you created to avoid incurring additional charges.
AWS-specific benefits
Although PostgreSQL has built-in JSON capabilities, Aurora PostgreSQL-Compatible and Amazon RDS for PostgreSQL provide infrastructure and management features that significantly enhance how effectively you can work with JSON data:
- Distributed storage architecture – Aurora’s storage layer reduces I/O bottlenecks when processing large JSON documents, allowing for more efficient read/write operations. Storage automatically grows without manual intervention, which is ideal for applications with unpredictable JSON data growth.
- ·Aurora cluster cache management – The shared buffer cache across instances improves performance for frequently accessed JSON documents. Buffer cache persistence across instance restarts prevents performance degradation after maintenance events.
- Parallel query processing – PostgreSQL’s parallel query processing can improve performance for some JSON operations by distributing work across multiple CPU cores. This feature is most effective for analytical queries on large datasets, though complex JSONB path operations and GIN index lookups might not always benefit from parallelization.
- Read replicas with minimal lag – Aurora replication enables up to 15 read replicas with minimal lag, helping you scale JSON read operations effectively. Amazon Aurora Serverless v2 dynamically scales to handle varying JSON read operations, automatically adjusting compute capacity as your workload changes.
- Instance type flexibility – You can choose the following instance types:
- Memory-optimized instances for applications with large JSON documents that benefit from in-memory processing
- Burstable-performance instances are cost-effective for workloads with moderate CPU usage and occasional spikes.
- Read-optimized instances for high-throughput JSON ingestion workloads.
- CloudWatch database insights – Aurora PostgreSQL-Compatible and Amazon RDS for PostgreSQL provide detailed monitoring that is specifically helpful for identifying bottlenecks in complex JSON operations.
Best practices
By following best practices, you can effectively use PostgreSQL’s JSON capabilities while maintaining performance and data integrity in your AWS-hosted databases.
- Strategic data modeling:
- Use JSONB for most use cases unless exact text preservation is required
- Normalize frequently queried fields into regular columns for better performance
- Consider partial normalization where some structured data lives in columns while variable data stays in JSON
- Indexing strategies:
- Create GIN indexes for frequently queried JSON paths.
- Use expression indexes for specific JSON properties
- Monitor index usage and remove unused indexes that slow down writes
- Query optimization:
- Use containment operators
@>and<@with GIN indexes for best performance - Leverage path operators
->,->>to avoid extracting entire documents - Extract values to appropriate types when performing comparisons:
- Use containment operators
- Performance considerations:
- Don’t overuse JSON for data that would be better in relational columns
- Use
JSONB_PATH_OPSfor GIN indexes when only using the containment operator. - Batch update JSON documents rather than updating individual fields frequently
- Perform targeted path updates rather than the full document
- Avoid query anti-patterns:
- Scanning all documents when more selective filters could be use
- Extracting entire JSON documents when only specific fields are needed
- Joining on JSON fields without proper indexing or type casting
- Schema evolution:
- Validate your JSON schema at the database level. You can use check constraints to check the validity of the JSON schema. For example, consider the following table definition:
- Recent versions of PostgreSQL offer several functions, such as “
IS JSON” , “IS JSON ARRAY” or “IS JSON OBJECT” to do simple validity checks. - You can also do application-level validation using JSON Schema or similar tools
- Storage considerations:
- Avoid document bloat by removing obsolete fields
- Avoid storing data in both JSON and regular columns
- AWS-specific considerations:
- Use CloudWatch Database Insights to identify bottlenecks in JSON operations.
- Account for JSON storage size because it can be less compact than normalized tables.
Conclusion
PostgreSQL’s JSON capabilities, combined with AWS managed services, offer a compelling alternative to NoSQL databases. You get schema flexibility without sacrificing transactions, powerful querying without learning new languages, and enterprise features without operational overhead.
To migrate your JSON workloads to a PostgreSQL platform, start with Amazon RDS for PostgreSQL, or Amazon Aurora PostgreSQL-Compatible for cost optimization.
Try PostgreSQL’s JSON capabilities discussed in this post and leave your comments here. Refer to other posts in this blog series to learn about AWS JSON database solutions.