AWS Database Blog
Using JSON with MySQL 5.7–compatible Amazon Aurora
What’s the big deal about JSON support in MySQL 5.7?
MySQL 5.6 supports numeric, date and time, string (character and byte) types, and spatial data types. Although this is a broad set of supported types, these primitive data types can limit your flexibility to evolve an application.
If you use MySQL 5.6, you have two options when planning ahead for evolution of the application. The first option is to specify the full schema containing all the fields that you currently need in the application. If the application subsequently needs a new field, you must update the schema to add that column. There are some benefits to this approach. You can index the new field. Also, features such as fast data definition language (DDL) in Amazon Aurora can minimize the operational burden of adding a column. However, the fact remains that you must execute a database schema change and also update the SQL statements to accommodate the change.
The second option is to use a string to encode a flexible set of fields and parse the string in the application layer. Although flexible, this approach places an unreasonable burden on you to parse the data.
This is where JSON comes in—it offers an excellent middle way by providing you the flexibility you need. JSON also provides the benefit of your not having to write any code to parse the data; the ORM or the language runtime should be able to care of it. JSON support was introduced in MySQL 5.7.8.
In addition to the benefits just listed, having JSON as a native type in MySQL means that the database can automatically validate JSON documents stored in JSON columns. Invalid documents produce an error. JSON as a native type also allows the database to optimize the JSON format. JSON documents stored in JSON columns are converted to an internal format that permits quick read access to document elements. When the server later must read a JSON value stored in this binary format, the value need not be parsed from a text representation. The binary format is structured to enable the server to look up subobjects or nested values directly by key or array index. It does so without reading all the values before or after them in the document.
Amazon Aurora now supports MySQL 5.7 compatibility, meaning you can now develop applications using JSON data types on top of MySQL 5.7–compatible Aurora.
The rest of this post walks through an example e-commerce application for an electronics store that uses JSON data types and MySQL-compatible Aurora.
Creating a schema
Electronics products can be diverse—laptops, cell phones, printers, TVs, DVDs, and so on. The product attributes can correspondingly vary widely. This makes it tricky to store product attributes in a normalized form so that you can search on different features and attributes. For example, we might want to do this for a product comparison.
We start by creating a database for our store.
For simplicity, our database only has three tables—brands, categories, and products. The brands
and categories
tables don’t have JSON fields, so let’s get them out of the way so we can get to the fun part.
The following is what the tables look like.
Now for the part that starts using JSON. The products
table holds different product attributes in a JSON column.
Inserting JSON data
Let’s store information about our first product. We can do this by directly inserting the attributes as a JSON object.
If you are using a GUI tool like MySQL Workbench, replace \G
at the end of the query with ;
Formatting a JSON object in an insert query can turn out to be quite a cumbersome exercise. We can use the function JSON_OBJECT
to make this easier.
The JSON_OBJECT
function accepts a list of key-value pairs in the format
JSON_OBJECT(key1,value1,key2,value2,key3,value3,……,key(n),value(n)
. It returns a JSON object.
Let’s insert information about our next product using the JSON_OBJECT
function.
Here we can also see another helpful function at work—JSON_ARRAY
. This function returns a JSON array when passed a set of values.
Another function that helps us store a JSON object is JSON_MERGE
. The JSON_MERGE
function takes multiple JSON objects and produces a single JSON object. This is useful when you get data as key-value pair objects. Let’s insert data about our next product using JSON_MERGE
.
We can use JSON_OBJECT
and JSON_MERGE
in combination to make things easier.
Let’s add a few more rows.
Now that we’ve added a few rows, let’s move on to run queries on JSON objects.
Selecting and filtering JSON data
It’s simple to write a select query like the following one.
However, this isn’t the most efficient or useful way to query a database. The function we want to use is JSON_EXTRACT
. For example, let’s say that we want to find all the Android devices we have.
There are three things to notice here. First, JSON_EXTRACT
can extract attributes from JSON that we can use for comparison and filtering.
Second, you can see another shorthand for reading attributes in the select clause: attributes->'$.operating_system'
.
Third, the results of the JSON_EXTRACT
still include quotes. That leads us to our next function, JSON_UNQUOTE
. Let’s rewrite the query with JSON_UNQUOT
E and see the difference.
MySQL doesn’t support indexes on JSON columns, but that restriction is easy to get around using virtual columns.
Let’s add a generated column that stores the memory of each device.
Now let’s create an index on the column.
Let’s look at the explain plan for the query to fetch all devices that have 16 GB memory. As you can see from the explain plan, this uses the index we created on the virtual column derived from the JSON column.
Another useful function is JSON_KEYS
, which returns the keys from the JSON object. Suppose that we want to see what attributes we store for an iPad.
Updating JSON data
Let’s take a look at how we can modify JSON values.
Let’s use the JSON_INSERT
function to add information about supported networks.
As we can see, the network attribute is added to the JSON.
JSON_INSERT
adds an attribute to the JSON. Let’s say we want to update an existing attribute like operating_system
. In this case, we use the JSON_REPLACE
function.
This is what the product now looks like.
JSON_INSERT
adds an attribute to the JSON. JSON_REPLACE
updates an attribute that is already present. If we don’t know if an attribute is present or not, we should use the JSON_SET
function. JSON_SET
updates the attribute if it’s already present and otherwise adds it.
Another useful function while running updates is the JSON_ARRAY_APPEND
. This function appends to an array attribute in a JSON. If the attribute is a scalar attribute, JSON_ARRAY_APPEND
converts it to an array.
Let’s continue from our previous example. Suppose that we want to add GSM to the networks supported by the phone.
The network attribute gets converted to an array and has the updated value.
Performing delete operations on a JSON object
There are two aspects to delete operations on a JSON object. We might want to delete an attribute from the JSON. Or we might want to delete the row itself, based on attributes from the JSON object.
Deleting an attribute from JSON is done using the JSON_REMOVE
function. For example, let’s remove the information associated with networks from our phone specifications.
Deleting a row based on the information contained in the JSON is done using the JSON_EXTRACT
function, which we used for filtering in WHERE
clauses before. For example, suppose that we want to delete all devices that have screen size less than 5.
Let’s try the DELETE using the JSON_EXTRACT
function.
To recap, the functions that help insert data to JSON are JSON_OBJECT
, JSON_ARRAY
, and JSON_MERGE
. The function that helps filter data is JSON_EXTRACT
. To update JSON, we can use the functions JSON_INSERT
, JSON_REPLACE
, JSON_SET
, and JSON_ARRAY_APPEND
. To delete JSON attributes and objects, we can use JSON_REMOVE
or delete using a WHERE
clause.
We’re excited to hear from you as you start using the feature. Feel free to leave a comment below or ping us at aurora-pm@amazon.com with any questions or feedback.
About the Author
Sirish Chandrasekaran is a product manager for Amazon Aurora at Amazon Web Services.
Yashada Jadhav is a database engineer for Amazon Aurora at Amazon Web Services.