AWS Database Blog

Migrate document workloads from Oracle SODA to Amazon DocumentDB

In this post, we show you a modernization path for the migration of your JSON workloads from on-premises databases to the AWS Cloud. You can move your document workloads to Amazon DocumentDB (with MongoDB compatibility), and use full capabilities of this purpose-built JSON database.

Amazon DocumentDB is a fully-managed native JSON document database that makes it simple and cost effective to operate critical document workloads at virtually any scale without managing infrastructure. Amazon DocumentDB simplifies your architecture by providing built-in security best practices, continuous backups, and native integrations with other AWS services.

When you migrate from commercial relational databases such as Oracle or SQL Server, you can choose another relational database as the migration target. This is a common path for migrations to the cloud. Many relational databases support the JSON data type. This makes it possible to store relational data and JSON documents inside the same database. However, in the long term this solution might have downsides such as performance degradation, increased costs, and additional complexity. To address these downsides, you can use a purpose-built JSON database such as Amazon DocumentDB.

Oracle SODA and Amazon DocumentDB

For this example, we use a modernization migration from Oracle Simple Object Data Access (SODA) to Amazon DocumentDB. Oracle SODA is a set of APIs that developers use to manage collections of documents in Oracle Database. We discuss similarities and show query examples of Oracle SODA and Amazon DocumentDB. We also show how you can migrate data from Oracle SODA to Amazon DocumentDB using AWS Database Migration Service (AWS DMS). Finally, we show how you can implement extract, transform, and load (ETL) processes using AWS Glue.

Let’s start by comparing Oracle SODA and Amazon DocumentDB query syntax. Many Oracle SODA queries can be run in Amazon DocumentDB without modifications. It may come as a surprise that engineers who are familiar with writing Oracle SODA queries are already familiar with writing Amazon DocumentDB queries.

To demonstrate the capabilities of Oracle SODA and Amazon DocumentDB, we create a collection of cars documents. We insert a sample JSON document to show its structure and fields, which we later use to build sample queries.

Let’s start with creating an Oracle SODA collection and inserting a sample JSON document.

  1. Connect to Oracle database
  2. Run the following stored procedure to create a SODA collection and insert a sample car JSON document:
DECLARE
  cars_collection SODA_COLLECTION_T;
  car SODA_DOCUMENT_T;
  insert_status NUMBER;
  column_def constant VARCHAR2(1000) := 
    '{
    "keyColumn":{"assignmentMethod": "CLIENT"},
    "contentColumn":{"sqlType": "CLOB"}
    }';
BEGIN
  cars_collection := DBMS_SODA.create_collection('cars', column_def);

  car := SODA_DOCUMENT_T(
    key => '1a563d33-01bb-41x9-a023-ae312359qw41',
    c_content => '{
      "make": "SportForYou",
      "model": "Bolid 1",
      "owner": "John Smith",
      "features": ["ABS", "ASR", "ESP", "GPS"],
      "parameters": [{
          "name": "millage",
          "value": "40000" 
        }, {
          "name": "color",
          "value": "Blue"
        }, {
          "name": "EngineType",
          "value": "Electric"
        }, {
          "name": "DriveType",
          "value": "4WD"
      }]
    }',
    media_type => 'application/json'
  );

  insert_status := cars_collection.insert_one(car);

  DBMS_OUTPUT.put_line('Status: ' || insert_status);
END;

Amazon DocumentDB is a NoSQL database, and you don’t need to create a collection before inserting a document into it. If the collection doesn’t exist, Amazon DocumentDB creates it. You can write the Oracle stored procedure from the previous example as a single Amazon DocumentDB command:

db.cars.insertOne({
  "make": "SportForYou",
  "model": "Bolid 1",
  "owner": "John Smith",
  "features": ["ABS", "ASR", "ESP", "GPS"],
  "parameters": [{
      "name": "millage",
      "value": "40000" 
    }, {
      "name": "color",
      "value": "Blue"
    }, {
      "name": "EngineType",
      "value": "Electric"
    }, {
      "name": "DriveType",
      "value": "4WD"
    }]
});

Oracle SODA and Amazon DocumentDB queries

The sample JSON document that we inserted contains the following fields: make, model, owner, array of string features, and array of parameters objects (with name and value fields).

Oracle SODA and Amazon DocumentDB query syntax share a lot in common. You can run many Oracle SODA queries in Amazon DocumentDB without modifications. Let’s see a few examples.

Let’s start with a basic Oracle SODA Query By Example (QBE). To keep sample codes short instead of iterating over the documents, we will only count them. In the first example, we want to count all cars that meet the following criteria:

  • The owner is John Smith
  • The make is SportForYou
  • It has the parameter DriveType with a value of 4WD
  • It has the feature GPS

In Oracle we use the following code:

DECLARE
  cars_collection SODA_COLLECTION_T;
  documents_count NUMBER;
BEGIN
  cars_collection := DBMS_SODA.open_collection('cars');
  documents_count := cars_collection.find().filter('{
    "owner": "John Smith", 
    "make": "SportForYou", 
    "parameters": { "name": "DriveType", "value": "4WD" }, 
    "features": { $in: ["GPS"]} 
  }').count();
  DBMS_OUTPUT.put_line('matching documents: ' || documents_count);
END;

When migrating to Amazon DocumentDB, you can copy the filter directly from the preceding Oracle SODA query and pass it as a JSON object to the find function in Amazon DocumentDB:

count = db.cars.find({
  "owner": "John Smith", 
  "make": "SportForYou", 
  "parameters": { "name": "DriveType", "value": "4WD" }, 
  "features": { $in: ["GPS"]} 
}).count();
print('matching documents: ', count);

The query syntax is exactly the same: query structure, JSON paths, $in operation. There are many more such examples: logical expressions like $and, $or, $nor, and $not; arithmetical expressions like $lt, $gt, and $between; and more.

If we want to count how many cars TruckForYou Heavy H1 or SportForYou Bolid 1 are in the Oracle SODA collection, we would use the following code:

documents_count := cars_collection.find().filter('{
  $or: [ 
  { "make": "TruckForYou", "model": "Heavy H1" },
  { "make": "SportForYou", "model": "Bolid 1" } ] 
}').count();

The preceding query is exactly the same in Amazon DocumentDB:

count = db.cars.find({
  $or: [
  { "make": "TruckForYou", "model": "Heavy H1" },
  { "make": "SportForYou", "model": "Bolid 1" } ]
}).count();

Although most operations are similar, there are some differences. For example, if we want to get all of the cars whose make matches Sport in Oracle SODA, we would use $like as in the following snippet:

documents_count := cars_collection.find().filter('{ "make": { $like: "%Sport%" } }').count();

In Amazon DocumentDB, $like doesn’t exist; instead the preceding query uses $regex:

count = db.cars.find({ "make": { $regex: /Sport/ } }).count();

To learn more about Oracle SODA filter specification, see Overview of SODA Filter Specifications (QBEs). To learn more about Amazon DocumentDB operators, see Query and Projection Operators.

Oracle SODA and Amazon DocumentDB pagination support

To complement query examples, let’s wrap up with a pagination support example. This is a very common feature used on many websites. Oracle SODA and Amazon DocumentDB support filter, sort, skip, and limit operations.

In Oracle SODA, the JSON query changes its syntax and becomes a nested JSON document with $query and $orderby fields, as shown in the following code:

skip := 0;
limit := 100;
documents_op := cars_collection.find().filter('{
  "$query": 
  { 
    "make": { $in: ["TruckForYou", "SportForYou"] } 
  },
  "$orderby":
  { 
    "owner": 1, "make": -1, "model": 1 
  } 
  }').skip(skip).limit(limit);

In Amazon DocumentDB, the find function stays the same and the sort parameters are passed to the sort function:

skip = 0;
limit = 100;
cursor = db.cars.find(
 {
    "make": { $in: ["TruckForYou", "SportForYou"] } 
 }).sort(
 { 
    "owner": 1, "make": -1, "model": 1 
 }).skip(skip).limit(limit);

Migrate data from Oracle SODA to Amazon DocumentDB

To move document workloads from Oracle to Amazon DocumentDB you can use AWS DMS. AWS DMS is a managed migration and replication service that helps move your database and analytics workloads to AWS quickly, securely, with minimal downtime, and zero data loss. AWS DMS supports migration between many database and analytics engines.

In the following example we use AWS DMS to copy data from our Oracle SODA cars collection to Amazon DocumentDB. When you create a replication task in AWS DMS you need to define a set of mapping rules. These rules instruct AWS DMS how the data should be replicated.

An Oracle SODA collection is implemented as a table and we need to transform two of its columns before they can be replicated to Amazon DocumentDB. The columns and their mapping rules are the following:

  1. id varchar – This needs to be renamed to _id in Amazon DocumentDB using the AWS DMS rename column transformation
  2. json_document clob – This needs to be changed to string data type using the AWS DMS change-data-type column transformation

The following code example shows the complete AWS DMS mapping rules for the cars collection:

{ 
    "rules": [ 
        { 
            "rule-type": "transformation", 
            "rule-id": "299791671", 
            "rule-name": "299791671", 
            "rule-target": "column", 
            "object-locator": { 
                "schema-name": "test", 
                "table-name": "cars", 
                "column-name": "id" 
            }, 
            "rule-action": "rename", 
            "value": "_id", 
            "old-value": null 
        }, 
        { 
            "rule-type": "transformation", 
            "rule-id": "299729313", 
            "rule-name": "299729313", 
            "rule-target": "column", 
            "object-locator": { 
                "schema-name": "test", 
                "table-name": "cars", 
                "column-name": "json_document" 
            }, 
            "rule-action": "change-data-type", 
            "data-type": { 
                "type": "string", 
                "length": "64000", 
                "scale": "" 
            } 
        }, 
        { 
            "rule-type": "selection", 
            "rule-id": "299692467", 
            "rule-name": "299692467", 
            "object-locator": { 
                "schema-name": "test", 
                "table-name": "cars" 
            }, 
            "rule-action": "include", 
            "filters": [] 
        } 
    ] 
}

You don’t need to make additional configurations; AWS DMS automatically converts the columns to JSON fields. Furthermore, columns that start with json_ are coerced as a nested JSON document. AWS DMS puts this coerced JSON document inside the field whose name is the original column name, but without the json_ prefix.

To illustrate this behavior, see the following document, which AWS DMS migrated to Amazon DocumentDB. The source Oracle SODA collection had id and json_document columns. The id column was migrated to _id and the json_document column was coerced as a nested document field:

{ 
  _id: '1a563d33-01bb-41x9-a023-ae312359qw41', 
  document: { 
    make: 'SportForYou',
    model: 'Bolid 1',
    owner: 'John Smith',
    features: ['ABS', 'ASR', 'ESP', 'GPS'],
    parameters: [{
        name: 'millage',
        value: '40000'
    }, {
        name: 'color',
        value: 'Blue'
    }, {
        name: 'EngineType',
        value: 'Electric'
    }, {
        name: 'DriveType',
        value: '4WD'
    }]
  } 
}

To learn more about how you can use AWS DMS to replicate data to Amazon DocumentDB, see: Using Amazon DocumentDB as a target for AWS Database Migration Service.

Implement ETL in Amazon DocumentDB

You can use AWS Glue as a fully-managed Apache Spark service to implement complex ETL processes for Amazon DocumentDB.

You can use the write_dynamic_frame operation to write DynamicFrame to Amazon DocumentDB just like other supported connection types:

glueContext.write_dynamic_frame.from_options(df, connection_type="documentdb",connection_options=documentdb_options)

For more information, see Examples: Setting connection types and options.

In your ETL process, you will want to customize the conversion of a flat comma-separated values (CSV) file into a nested JSON document. Using the cars example, but with more fields, let’s say we have the following flat file CSV:

make,model,owner,city,features,p1_name,p1_value,p2_name,p2_value,p3_name,p3_value 
SportForYou,Bolid B1,John Smith,New York,"ABS,ASR,ESP",EngineType,Electric,DriveType,4WD,Color,Red

We want to convert it into the following JSON document:

{
  "carInfo": {
    "make": "SportForYou",
    "model": "Bolid B1",
    "features": [
      "ABS",
      "ASR",
      "ESP"
    ],
    "parameters": [
      {
        "name": "EngineType",
        "value": "Electric"
      },
      {
        "name": "DriveType",
        "value": "4WD"
      },
      {
        "name": "Color",
        "value": "Red"
      }
    ]
  },
  "ownerInfo": {
    "name": "John Smith",
    "city": "New York"
  }
}

You can use the following snippet in AWS Glue Studio as a custom transformation step. You can use it as a starting point for creating your own nested JSON documents in AWS Glue. Note that the AWS Glue DynamicFrame doesn’t require developers to create schemas. AWS Glue will infer the schema of the newly created DynamicFrame automatically.

def NestedCarAndOwerInfoTransform(glueContext, dfc) -> DynamicFrameCollection:
    import pyspark.sql.functions as f
    
    df = dfc.select(list(dfc.keys())[0])
    
    df = df.withColumn(
        "carInfo",
        f.struct(
            f.col("make"),
            f.col("model"),
            f.split(f.col("features"), ",").alias("features"),
            f.array(
                f.struct(
                    f.col("p1_name").alias("name"), f.col("p1_value").alias("value")
                ),
                f.struct(
                    f.col("p2_name").alias("name"), f.col("p2_value").alias("value")
                ),
                f.struct(
                    f.col("p3_name").alias("name"), f.col("p3_value").alias("value")
                ),
            ).alias("parameters"),
        ),
    )
    
    df = df.withColumn(
        "ownerInfo", f.struct(f.col("owner").alias("name"), f.col("city"))
    )
    
    df = df.drop(
        f.col("make"),
        f.col("model"),
        f.col("features"),
        f.col("owner"),
        f.col("city"),
        f.col("p1_name"),
        f.col("p1_value"),
        f.col("p2_name"),
        f.col("p2_value"),
        f.col("p3_name"),
        f.col("p3_value"),
    )
    
    return DynamicFrameCollection({"NestedCarAndOwerInfoTransform": df}, glueContext)

Benefits of migrating to Amazon DocumentDB

In this section, we discuss the three key benefits of modernizing JSON workloads and migrating them to Amazon DocumentDB.

The first benefit is performance and features completeness. Amazon DocumentDB can outperform relational databases that use the JSON data type by orders of magnitude. In a recent migration project, our team compared the PostgreSQL JSON data type and Amazon DocumentDB. Amazon DocumentDB was able to meet tight SLA requirements even with 14 times more data compared to PostgreSQL. In fact, the database performance is the main reason why purpose-built databases (including graph, time series, and key-value) were built and are so popular and successful. Additionally, as a JSON-native database, Amazon DocumentDB, offers many more JSON features than relational databases do. A simple example would be indexing JSON array fields or running powerful JSON documents aggregation pipelines.

The second benefit of moving to Amazon DocumentDB is the fact that you may already be familiar with Amazon DocumentDB query syntax. In our migration project, we migrated from Oracle SODA and found that things like JSON paths, query structure, array expressions, logical expressions, text operations, or arithmetic expressions were similar. Many queries can run in Amazon DocumentDB without needing changes. This means that you may already be familiar with Amazon DocumentDB without even knowing it. This also means that database engineers with experience in Oracle SODA can quickly get started with Amazon DocumentDB.

The third benefit of moving to Amazon DocumentDB is that you get access to the existing MongoDB ecosystem. There are numerous frameworks, ETL engines, and drivers for virtually every programming language which you can use with Amazon DocumentDB. Amazon DocumentDB supports MongoDB 3.6, 4.0, and 5.0 wire protocols. However, there are some differences between Amazon DocumentDB and MongoDB, for more see Functional Differences: Amazon DocumentDB and MongoDB and Supported MongoDB APIs, Operations, and Data Types. As we showed in this post, you can use the Apache Spark framework to implement your ETL processes and use MongoDB sink to write data to Amazon DocumentDB. If you want to use Apache Spark, then you can use the AWS Glue managed Spark service. These are just a few examples.

Conclusion

In this post, we discussed how you can modernize JSON workloads and migrate them to Amazon DocumentDB. Modernization to Amazon DocumentDB can be simpler than you think, and there are many features that you, your team, and your project can benefit from.

You can follow the Get Started with Amazon DocumentDB guide to get you started with Amazon DocumentDB in less than 15 minutes. Once you have Amazon DocumentDB provisioned, you can run the Amazon DocumentDB queries that we used in this post and see them in practice.

Finally, for more information on Amazon DocumentDB, visit the product page.


About the Author

Lukasz Budnik is a Principal Software Dev Engineer working in the Amazon DMA team. Lukasz works with the biggest Amazon customers on the most challenging cloud migration and modernization programs. In his free time, Lukasz enjoys learning new human languages. ¡Hola amigos!