AWS Database Blog

Support JSON data using Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL and Java Spring Boot on AWS

Many customers choose Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition to store JSON business objects along with relational transactional data. You can also use the Spring Boot Java framework to implement microservice applications. The Spring Boot framework provides rich features, including auto configuration, dependency injection, metrics measurement, health checks, and more, which expedite application development.

JSON is a common data exchange format used in REST APIs, web applications, and mobile applications. Unlike a flat relational structure, JSON supports a hierarchical structure of basic types, nested objects, and nested arrays. It’s widely used in schema-less document databases which allow documents to evolve with application’s needs.

In this post, we walk you through how to support business data captured in JSON format in microservices developed in the Spring Boot Java framework, Amazon RDS for PostgreSQL, or Aurora PostgreSQL.

Overview of solution

PostgreSQL provides robust features to support JSON business data through the jsonb data types. This post uses Amazon RDS for PostgreSQL to show how you can create a database schema to store JSON business objects. We cover how you can implement a data access layer using the Spring Data Java Persistence API (JPA) to access JSON data in the Amazon RDS for PostgreSQL tables. We also explain how you can create microservices to save and query JSON data. Finally, we show how you can create indexes to improve the performance of searching business data using JSON keys. You can apply the same approach to the case of using Aurora PostgreSQL to store your JSON business objects. The post Is Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL a better choice for me? analyzes the architectural differences between Amazon RDS for PostgreSQL and Aurora PostgreSQL and helps you select the best option.

We use the following JSON sample data to explain our solution:

{
    "id": 1,
    "name": "testModel",
    "version": 1,
    "definition": {
        "metaData": {
            "catalogs": [
                "machine learning",
                "classification",
                "duplicate detection"
            ],
            "years": [
                2021,
                2022
            ],
            "author": "Test Users",
            "name": "Duplicate Detection",
            "container": "duplicate-detection:1.0"
        },
        "inputParameters": {
            "startDate": {
                "display": "Start Date",
                "type": "date-time"
            },
            "endDate": {
                "display": "End Date",
                "type": "date-time"
            }
        },
        "outputParameters": {
            "result": {
                "type": "array",
                "items": {
                    "type": "string"
                }
            }
        }
    }
}

The sample JSON describes a data processing pipeline that ingests data records based on the start date and end date in the inputParameters structure, and then detects duplicate records using a machine learning model specified in the container data element. You can create many different data processing pipeline jobs based on the same model but with different start and end dates.

The process includes the following steps:

  1. Create a table to store the sample pipeline definition.
  2. Configure the Spring Boot application to support PostgreSQL and JSON data types.
  3. Create Spring Profiles to connect to a PostgreSQL database on AWS.
  4. Create database access classes and a Spring Boot service bean to save and query sample JSON business data.
  5. Create REST APIs to save and retrieve JSON data.
  6. Test the REST APIs.
  7. Create an index to improve query performance.

Prerequisites

You should have some experience with Spring Boot applications, the Spring RESTful web service, Spring Data Java Persistence, and relational databases. You should also have a basic understanding of the JSON data format. To follow the examples in this post, you need to create your own Spring Boot based project by visiting Spring Initializr.

Create a table to store the pipeline definition

PostgreSQL offers two data types for storing JSON data: The json and jsonb data types accept almost the same sets of values as input. The json data type stores the input JSON string as text, and no preprocessing is needed to save the string into a database column. On the other hand, when the jsonb data type is used to save an input text, the input text is converted into a decomposed binary format. The jsonb data type supports indexing, which offers a significant performance advantage over the json data type on read operations. The jsonb data type has more functions and operators to use than the json data type.

In the following SQL, we use the jsonb data type to create a table to store the sample pipeline definition. The table contains the following columns:

  • id – A primary key
  • name – A text string for the pipeline name
  • version – An integer for the pipeline version
  • definition_json – A jsonb data field to save the pipeline definition

See the following code:

CREATE TABLE IF NOT EXISTS demo.model
(id SERIAL PRIMARY KEY,
definition_json jsonb,
name character varying(255),
version integer);

Configure the Spring Boot application to support PostgreSQL and JSON data types

If you’re familiar with setting up a database connection in a Spring Boot application, you can skip this section.

To connect to a PostgreSQL database, you can add two dependencies to the project maven build file pom.xml: the PostgreSQL JDBC (Java Database Connectivity) driver and an open-source Hibernate type library, as shown in the following code. The Hibernate type library provides JSON type support, which isn’t available in the core package of the Hibernate object relational mapping framework (ORM). The Spring Data Java Persistence API (JPA) uses the Hibernate ORM as the default object relational mapping provider.

<dependency>
     <groupId>org.postgresql</groupId>
     <artifactId>postgresql</artifactId>
     <scope>runtime</scope>
</dependency>
<dependency>
     <groupId>com.vladmihalcea</groupId>
     <artifactId>hibernate-types-52</artifactId>
     <version>${hibernate-types.version}</version>
</dependency>

Create Spring Profiles to connect to a PostgreSQL database on AWS

If you’re familiar with Spring Profiles and their usage, you can skip this section.

You can create different Spring Profiles for an application to deploy and run the application on different environments, such as one profile to run and test the application locally and one profile to deploy and run the application as a container service in AWS Fargate. Each profile configures the application with the properties suitable for the environment represented by the profile. You can also use the @Profile annotation to map different beans to different profiles.

To connect to your PostgreSQL database, you can modify the profile property files as follows. You may get the database connection information from your database administrator.

spring.datasource.url = jdbc:postgresql://<rds_hostname>:<rds_port>/<rds_database_name>
spring.datasource.username = <rds_username>
spring.datasource.password = <rds_password>

Create database access classes and a Spring Boot service bean

To access the model table created in the PostgreSQL database, you create a JPA entity class. The entity class uses the following annotations:

  • @Id – Indicates that the property is a primary key
  • @Type(type=”jsonb”) – Specifies that the property uses the jsonb database type in the PostgreSQL database

Review the following code:

@TypeDefs({
    @TypeDef(name="jsonb", typeClass = JsonBinaryType.class)
})
@Entity
@Table(name= "model", schema="demo")
public class Model implements Serializable {
    @Id
    @Column(name = "id")
    @JsonProperty("id")
    private long id;

    @Column(name = "name")
    @JsonProperty("name")
    private String name;

    @Column(name = "version")
    @JsonProperty("version")
    private Integer version;

    @Type(type="jsonb")
    @Column(name = "definition_json", columnDefinition = "jsonb")
    @JsonProperty("definitionJson")
    private Map<String, Object> definitionJson;
}

To store and retrieve information from the table model, you create an extension of the JpaRepository interface. The interface contains the API definitions for creating, reading, updating, deleting, pagination, and sorting operations.

@Repository
public interface ModelRepository extends JpaRepository<Model, Long> {
}

You can then use the spring @Service annotation to create a service provider ModelService.java to implement the business functions to save and retrieve data processing pipeline models.

The saveModel method creates a model object based on an input data transfer object (DTO) modelDto. The DTO represents the sample JSON data in an object format. The method uses the ModelRepository to save the model to the PostgreSQL database table. See the following code:

@AllArgsConstructor
@NoArgsConstructor
@Builder
@Setter
@Getter
public class ModelDto implements Serializable {

    @AllArgsConstructor
    @NoArgsConstructor
    @Builder
    @Setter
    @Getter
    public static class Item implements Serializable {

        @JsonProperty("type")
        private String type;
    }
    @AllArgsConstructor
    @NoArgsConstructor
    @Builder
    @Setter
    @Getter
    public static class Parameter implements Serializable {
        @JsonProperty("display")
        private String display;

        @JsonProperty("type")
        private String type;

        @JsonProperty("items")
        private Item items;
    }
    @AllArgsConstructor
    @NoArgsConstructor
    @Builder
    @Setter
    @Getter
    public static class Definition implements Serializable {
        @JsonProperty(value = "metaData", required = true)
        private ModelMetaInfo metaData;

        @JsonProperty(value = "inputParameters", required = true)
        private Map<String, Parameter> inputParameters;

        @JsonProperty(value = "outputParameters", required = true)
        private Map<String, Parameter> outputParameters;
    }

    @JsonProperty(value = "id", required = true)
    private long id;

    @JsonProperty(value = "name", required = true)
    private String name;

    @JsonProperty(value = "version", required = true)
    private int version;

    @JsonProperty(value = "definition", required = true)
    private Definition definition;
}

The getModelListByFilter method queries the PostgreSQL database table and returns a list of models matching with the input filter. The PostgreSQL database provides functions and operators to extract JSON data items and perform value comparisons. The query uses the operator @> to check if the definition_json column field in the PostgreSQL database table contains the right JSON entries defined by the filter. Review the following code:

@Service
public class ModelService {

    private static final String FILTER_MODEL_SQL =
            "select id, name, version, definition_json from demo.model where definition_json @> '%s'";

    @PersistenceContext
    private EntityManager entityManager;

    private final ModelRepository modelRepository;

    public ModelService(ModelRepository modelRepository) {
        this.modelRepository = modelRepository;
    }

    public Model saveModel(ModelDto modelDto) {
        try {
            Model model = Model.builder().name(modelDto.getName()).id(modelDto.getId())
                    .version(modelDto.getVersion())
                    .definitionJson(JsonUtil.objectToMap(modelDto.getDefinition())).build();
            return modelRepository.save(model);
        } catch (IOException e) {
            throw new InternalProcessingException(e.getMessage());
        }
    }

    public List<Model> getModelListByFilter(ModelFilter filter) {
        try {
            String sql = String.format(FILTER_MODEL_SQL, JsonUtil.toString(filter));
            Query query = entityManager.createNativeQuery(sql, Model.class);
            return query.getResultList();
        } catch (IOException e) {
            throw new InternalProcessingException(e.toString());
        }
    }
}

As an example, you can use the following filter to find the models belonging to the catalog machine learning and suitable for use in the year 2021:

{
    "metaData": {
        "catalogs": ["machine learning"],
        "years": [2021]
    }
}

Based on the filter value, the getModelListByFilter method generates the following SQL statement:

select id, name, version, definition_json from demo.model 
where definition_json @> '{"metaData":{"catalogs":["machine learning"],"years":[2021]}}'

Create REST APIs to load and retrieve JSON business data

You can add a REST controller called ModelController.java to implement two REST APIs: one for loading models and one for filtering and finding matching models. Review the following code:

@RestController
@RequestMapping("/models")
public class ModelController {
    private final ModelService modelService;

    public ModelController(ModelService modelService) {
        this.modelService = modelService;
    }

    @PostMapping(value = "", consumes = MediaType.APPLICATION_JSON_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
    public @ResponseBody ResponseEntity<ModelResponse> registerModel(@RequestBody ModelDto modelDto) {
        Model model = modelService.saveModel(modelDto);
        return new ResponseEntity<>(ModelResponse.builder().data(model).status(HttpStatus.CREATED).build(),
                HttpStatus.CREATED);
    }

    @PostMapping(value = "/filter", consumes = MediaType.APPLICATION_JSON_VALUE, produces = MediaType.APPLICATION_JSON_VALUE)
    public @ResponseBody ResponseEntity<ModelListResponse> filterModels(@RequestBody ModelFilter modelFilter) {
        List<Model> modelInfoList = modelService.getModelListByFilter(modelFilter);
        return new ResponseEntity<>(ModelListResponse.builder().data(modelInfoList).status(HttpStatus.OK).build(),
                HttpStatus.OK);
    }
}

The REST controller uses the DTOs ModelResponse and ModelListResponse:

@AllArgsConstructor
@NoArgsConstructor
@SuperBuilder
@Getter
@Setter
public class ModelResponse {
    protected HttpStatus status;
    protected String message;
    private Model data;
}

@AllArgsConstructor
@NoArgsConstructor
@SuperBuilder
@Getter
@Setter
public class ModelListResponse {
    protected HttpStatus status;
    protected String message;
    private List<Model> data;
}

While our post focuses on using and supporting JSON data in Amazon RDS for PostgreSQL and Spring Boot applications, you must take additional steps to properly secure applications for production deployment. We recommend securing your application using Spring Security.

Test the REST APIs

Postman is a popular API platform used by developers to design, build, and test APIs. You can use Postman to create two requests to test two APIs created in the REST controller ModelController.java, as shown in the following code. You need to replace the variable your_service_url in the HTTP requests with the root path for the REST controller ModelController.java.

HTTP Request for loading the sample model
POST {{your_service_url}}/models Body Typ JSON Body Content {    "id": 1,    "name": "testModel",    "version": 1,    "definition": {        "metaData": {            "catalogs": [                "machine learning",                "classification",                "duplicate detection"            ],            "years": [                2021,                2022            ],            "author": "Test Users",            "name": "Duplicate Detection",            "container": "duplicate-detection:1.0"        },        "inputParameters": {            "startDate": {                "display": "Start Date",                "type": "date-time"            },            "endDate": {                "display": "End Date",                "type": "date-time"            }        },        "outputParameters": {            "result": {                "type": "array",                "items": {                    "type": "string"                }            }        }    } }
HTTP Request for filtering and finding models
POST {{your_service_url}}/models/filter Body Type JSON Body Content { "metaData": { "catalogs": ["machine learning"], "years": [2021] } }

Create an index to improve query performance

We use the following schema and table to show you how to improve the query performance of the jsonb data type:

CREATE SCHEMA IF NOT EXISTS demo;

CREATE TABLE IF NOT EXISTS demo.model
(id SERIAL PRIMARY KEY,
definition_json jsonb,
name character varying(255),
version integer);

We populate 2 million rows of data into the table demo.model.

We run the following query, which uses the @> operator to search the JSON’s keys and their values:

select id, name, version, definition_json
from demo.model
where definition_json @> '{"metaData": {"years":[1999]}}';

The following screenshot shows the query result matching one record.

Then, we use the explain analyze command to find the query’s run plan:

EXPLAIN ANALYZE select id, name, version, definition_json
from demo.model
where definition_json @> '{"metaData": {"years":[1999]}}';

Because there is no index created for the column definition_json, the query plan shows Parallel Seq Scan on model. On our testing platform, the runtime is 626.949 milliseconds.

We follow up by modifying the PostgreSQL database by adding a generalized inverted index ( GIN) to the column definition_json using the following SQL statement:

create index idx_definition_json on 
demo.model using gin (definition_json);

Then we rerun the same explain analyze command to find the query’s run plan.

After we create the index idx_definition_json, the new query plan shows Bitmap Index Scan on idx_definition_json. On our testing platform, the runtime is only 0.124 milliseconds. The query’s performance has improved dramatically—it’s 5,056 times faster.

GIN indexes cause additional costs for write operations. You may want to avoid using GIN indexes if your table operations are write-heavy. In our test, loading 2 million records with the index idx_definition_json takes 106 seconds. Without the index, the same loading operation takes 31 seconds. Before adding GIN indexes to database tables, you may want to analyze the database operations used by your applications and test the performance impact of the indexes on your application.

We run another query with two combined keys: "catalogs":["machine learning"] and "years":[1999] under metadata:

EXPLAIN ANALYZE select id, name, version, definition_json 
from demo.model
where definition_json @> '{"metaData":{"catalogs":["machine learning"],"years":[1999]}}'

The following screenshot shows the query and its run plan.

Our tests show that GIN indexes can be used to efficiently search for keys or key/value pairs occurring within a large number of jsonb documents.

Conclusion

In this post, we showed how to create a jsonb data column in Amazon RDS for PostgreSQL to store your business data and how to create an index to improve query performance. We also illustrated how to connect to your PostgreSQL database to load and query JSON data using the popular Spring Boot Java framework. You can apply the same approach to Aurora PostgreSQL.

If you have questions or suggestions, leave them in the comments section below.


About the authors

Jeff Li is a Senior Cloud Application Architect with the Professional Services team at AWS. He is passionate about diving deep with customers to create solutions and modernize applications that support business innovations. In his spare time, he enjoys playing tennis, listening to music, and reading.

Shunan Xiang is a Senior Database Consultant with the Professional Services team at AWS. He works as a database migration specialist to help Amazon customers in design and implementing scalable, secure, performant, and robust database solutions on the cloud. He also develops custom Data Lake House for the customer to support end-to-end supply chain integration.