AWS Database Blog

Automate your Neo4j to Amazon Neptune migration using the neo4j-to-neptune utility

Neo4j and Amazon Neptune are both graph databases designed for online, transactional graph workloads that support the labeled property graph data model. Migrating from Neo4j to Neptune introduces several challenges, including differences in query language behavior, architectural design, and the complexity of data migration itself. Ensuring that existing Cypher queries continue to function as expected is a critical part of this process, because variations in supported syntax and semantics can create compatibility gaps.

To help teams address these issues early and streamline their overall migration journey, dedicated tools and guidance are available for validating Cypher compatibility before and during the transition. For more information about validating Cypher query compatibility when migrating from Neo4j to Neptune, see Validate Neo4j Cypher queries for Amazon Neptune migration.

In this post, we walk you through two methods to automate your Neo4j database to Neptune using the neo4j-to-neptune utility. This tool offers a fully automated end-to-end process in addition to a step-by-step manual process.

Solution overview

By demonstrating the effective use of both methods, we aim to provide you with a comprehensive understanding of how to use automated tools to suit your specific migration needs and successfully transition your graph-based workloads to the Neptune service.

The migration process generally includes the following main components:

  1. General information: Understanding the fundamental differences between Neo4j and Neptune
  2. Preparation phase: Planning and preparing for the migration
  3. Infrastructure provisioning: Setting up the necessary Neptune infrastructure
  4. Data migration: Moving data from Neo4j to Neptune
  5. Compatibility considerations: Understanding feature compatibility between Neo4j and Neptune
  6. Cypher query rewrites: Adapting Cypher queries to work with the Neptune openCypher implementation

The following architecture shows the building blocks that you need to complete the migration process:

Figure 1: Architecture diagram showing the migration process components

  1. An Amazon Elastic Compute Cloud (Amazon EC2) instance to download and install the neo4j-to-neptune utility. This instance acts both as the temporary server for staging CSV files and as a client to run AWS Command Line Interface (AWS CLI) commands, such as copying exported files to an Amazon Simple Storage Service (Amazon S3) bucket and loading data into Neptune.
  2. An S3 bucket from which to load data into Neptune.
  3. A Neptune DB cluster with one graph database instance.

Prerequisites

Before starting the migration, ensure you have the following resources:

Build the neo4j-to-neptune utility

You will build the neo4j-to-neptune utility using the source code from GitHub. The process involves cloning the Neptune tools repository from GitHub to create a local copy on the EC2 instance, which contains various utilities for working with Neptune, including the neo4j-to-neptune conversion tool. After cloning the GitHub repository, you use maven commands (mvn clean install) to build the utility, which cleans previous compilations, compiles the code, runs tests, and creates an executable JAR file while installing the package in your local maven repository. Set the JAVA_HOME to the JDK17 version you installed on the EC2 instance.

  1. Connect to an EC2 instance in your AWS account to build the neo4j-to-neptune utility. Note that using an EC2 instance isn’t mandatory—any Linux environment with the necessary dependencies is sufficient for building this utility.
  2. Clone then GitHub repository.
    git clone https://github.com/awslabs/amazon-neptune-tools.git
  3. Set the JAVA_HOME to the JDK17 version.
  4. Build the utility using Maven.
    cd neo4j-to-neptune
    mvn clean install
    
    [WARNING] jackson-core-2.20.0-rc1.jar, jackson-databind-2.20.0-rc1.jar define 1 overlappping classes: 
    [WARNING]   - META-INF.versions.9.module-info
    [WARNING] maven-shade-plugin has detected that some .class files
    [WARNING] are present in two or more JARs. When this happens, only
    [WARNING] one single version of the class is copied in the uberjar.
    [WARNING] Usually this is not harmful and you can skeep these
    [WARNING] warnings, otherwise try to manually exclude artifacts
    [WARNING] based on mvn dependency:tree -Ddetail=true and the above
    [WARNING] output
    [WARNING] See https://docs.codehaus.org/display/MAVENUSER/Shade+Plugin
    [INFO] Replacing /home/ec2-user/neptune/neo4j-to-neptune/amazon-neptune-tools/neo4j-to-neptune/target/neo4j-to-neptune.jar with /home/ec2-user/neptune/neo4j-to-neptune/amazon-neptune-tools/neo4j-to-neptune/target/neo4j-to-neptune-1.0-SNAPSHOT-shaded.jar
    [INFO] 
    [INFO] --- maven-install-plugin:2.5.1:install (default-install) @ neo4j-to-neptune ---
    Downloading from central: https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.6/commons-codec-1.6.pom
    Downloaded from central: https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.6/commons-codec-1.6.pom (11 kB at 1.6 MB/s)
    Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.4/maven-shared-utils-0.4.pom
    Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.4/maven-shared-utils-0.4.pom (4.0 kB at 337 kB/s)
    Downloading from central: https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.6/commons-codec-1.6.jar
    Downloading from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.4/maven-shared-utils-0.4.jar
    Downloaded from central: https://repo.maven.apache.org/maven2/org/apache/maven/shared/maven-shared-utils/0.4/maven-shared-utils-0.4.jar (155 kB at 17 MB/s)
    Downloaded from central: https://repo.maven.apache.org/maven2/commons-codec/commons-codec/1.6/commons-codec-1.6.jar (233 kB at 11 MB/s)
    [INFO] Installing /home/ec2-user/neptune/neo4j-to-neptune/amazon-neptune-tools/neo4j-to-neptune/target/neo4j-to-neptune-1.0-SNAPSHOT.jar to /home/ec2-user/.m2/repository/com/amazonaws/neo4j-to-neptune/1.0-SNAPSHOT/neo4j-to-neptune-1.0-SNAPSHOT.jar
    [INFO] Installing /home/ec2-user/neptune/neo4j-to-neptune/amazon-neptune-tools/neo4j-to-neptune/pom.xml to /home/ec2-user/.m2/repository/com/amazonaws/neo4j-to-neptune/1.0-SNAPSHOT/neo4j-to-neptune-1.0-SNAPSHOT.pom
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 15.259 s
    [INFO] Finished at: 2025-08-08T21:25:55Z
    [INFO] Final Memory: 35M/265M
    [INFO] ------------------------------------------------------------------------
  5. Maven will build the JAR file under the target folder. You can copy the JAR file to any directory you prefer.
    cd target
    ls -ls neo4j-to-neptune.jar
  6. Copy the neo4j-to-neptune.jar file to the directory containing the CSV files exported from Neo4j. The CSV export process will be detailed in a subsequent section of this post.

For this post, we use the air-routes graph dataset, which models the global airline network as a graph structure. This dataset represents airports, countries, continents, and their interconnecting routes. The graph employs distinct vertex types to denote different entity categories, while edges—annotated with labels and properties—capture the relationships between these entities:

  • It contains 3,748 airports
  • It includes 57,645 routes between these airports
  • The data is available in GraphML format as air-routes.graphml

Manual migration process

The manual migration process from Neo4j to Neptune consists of following steps:

  1. Export Neo4j graph data into a CSV file using the APOC export procedures.
  2. Convert the CSV file to Neptune format
  3. Upload the converted vertices and edges file to an S3 bucket.
  4. Bulk load to Neptune.

Export Neo4j graph data into a CSV file using the APOC export procedures

To export data from Neo4j using APOC procedures, you first need to install the APOC library in your Neo4j environment. See Installation to install the APOC library. Then, you must enable exports by adding apoc.export.file.enabled=true to your neo4j.conf configuration file. Note that this requires a complete shutdown and restart of the Neo4j instance. The actual export is performed using the apoc.export.csv.all procedure, which creates a single CSV file containing all nodes and relationships data.

It’s important to note that when executing export command, you should avoid using the {stream:true} parameter, because streaming the results to the browser and downloading them as a CSV file will result in a file that won’t be correctly processed by the conversion utility. The export file path is resolved relative to the Neo4j import directory.

Enable exports in neo4j.conf

Add the following line to enable exports to the neo4jc.conf file:

apoc.export.file.enabled=true

Methods for running a Neo4j APOC CSV export query

There are several methods that you can use to run a Neo4j APOC CSV export query.

  • Neo4j browser:
    CALL apoc.export.csv.all(
      "neo4j-export.csv",
      {d:','}
    )
  • Paste and execute the preceding code in the Neo4j browser interface
  • Most straightforward for quick testing
  • Provides visual feedback
  • Command Line (cypher-shell):
    cypher-shell -u neo4j -p password "CALL apoc.export.csv.all('neo4j-export.csv', {d:','})"
  • Useful for automation and scripts
  • Requires cypher-shell to be installed
  • Can be integrated into shell scripts
  • Neo4j admin tool:
    neo4j-admin database execute --file=export-query.cypher
  • Create a file containing the query
  • Good for batch operations
  • Provides administrative level access
  • Rest API:
    POST https://localhost:7474/db/neo4j/tx/commit
    Content-Type: application/json
    {
      "statements": [{
        "statement": "CALL apoc.export.csv.all('neo4j-export.csv', {d:','})"
      }]
    }
  • HTTP-based approach
  • Useful for remote execution
  • Platform-independent

Managing multi-valued property migration from Neo4j to Neptune

Neo4j allows homogeneous lists of basic types to be stored as properties on both nodes and edges. These lists can contain duplicate values.

When migrating data from Neo4j to Neptune, handling multi-valued properties requires special consideration because of differences in how these databases manage property values. Neptune has different constraints—it supports set and single cardinality for vertex properties, but only single cardinality for edge properties. This creates challenges when migrating Neo4j properties containing duplicate values to Neptune.

To manage this migration, two key parameters are available:

  • Node property policy (--node-property-policy):
  • LeaveAsString: Preserves multi-valued properties as JSON-formatted list strings
  • Halt: Stops migration if multi-valued properties are found
  • PutInSetIgnoringDuplicates (default): Converts to Neptune set properties, removing duplicates
  • PutInSetButHaltIfDuplicates: Converts to set properties but stops if duplicates are found
  • Relationship property policy (--relationship-property-policy):
  • LeaveAsString (default): Stores multi-valued properties as JSON-formatted list strings
  • Halt: Stops migration if multi-valued properties are encountered
  • These parameters provide flexibility in handling property migrations while maintaining data integrity according to your specific requirements.

In this example, you use the Neo4j browser tool to run the apoc.export.csv.all command to export all the vertices and edges to a CSV file. This Neo4j database has 3,784 nodes (3,504 airports, 273 countries,7 continents) and 57,645 edges.

CALL apoc.export.csv.all(
 "neo4j-export.csv",
 {d:','}
)

The export command generates a comma separated export file that contains all nodes and edges of the air routes database in the Neo4j database. The following snippet shows an example of the export file.

"_id","_labels","city","code","country","desc","elev","icao","id","label","lat","lon","longest","region","runways","type","_start","_end","_type","dist"
"0",":airport","Atlanta","ATL","US","KATL","1026","KATL","1","airport","33.6366996765137","-84.4281005859375","12390","US-GA","5","airport",,,,
"1",":airport","Anchorage","ANC","US","PANC","151","PANC","2","airport","61.1744003295898","-149.996002197266","12400","US-AK","3","airport",,,,
"2",":airport","Austin","AUS","US","KAUS","542","KAUS","3","airport","30.1944999694824","-97.6698989868164","12250","US-TX","2","airport",,,,
"3",":airport","Nashville","BNA","US","KBNA","599","KBNA","4","airport","36.1245002746582","-86.6781997680664","11030","US-TN","4","airport",,,,
"4",":airport","Boston","BOS","US","KBOS","19","KBOS","5","airport","42.36429977","-71.00520325","10083","US-MA","6","airport",,,,
"5",":airport","Baltimore","BWI","US","KBWI","143","KBWI","6","airport","39.17539978","-76.66829681","10502","US-MD","3","airport",,,,
"6",":airport","Washington D.C.","DCA","US","KDCA","14","KDCA","7","airport","38.8521003723145","-77.0376968383789","7169","US-DC","3","airport",,,,
"7",":airport","Dallas","DFW","US","KDFW","607","KDFW","8","airport","32.896800994873","-97.0380020141602","13401","US-TX","7","airport",,,,
:::
,,,,,,,,,,,,,,,,"0","2","route","809"
,,,,,,,,,,,,,,,,"0","3","route","214"
,,,,,,,,,,,,,,,,"0","4","route","945"
,,,,,,,,,,,,,,,,"0","5","route","576"
,,,,,,,,,,,,,,,,"0","6","route","546"
,,,,,,,,,,,,,,,,"0","7","route","729"

This format isn’t directly supported by the Neptune bulk loader, so you will use the neo4j-to-neptune utility to convert the exported CSV file into a format that can be directly imported into the Neptune database.

Convert the generated CSV file into a Neptune-supported format

Now that you have the data in an exported CSV file, you need to convert the file to a format that’s supported by Neptune.

  1. Upload the CSV file exported from Neo4j to the EC2 instance you configured.
  2. Use the neo4j-to-neptune tool to convert the CSV files into a format compatible with the Neptune bulk loader.
    java -jar neo4j-to-neptune.jar convert-csv -i ./neo4j-airroutes-export.csv -d output --infer-types
    Vertices: 3784
    Edges   : 57645
    Output  : output/1754689143727
    output/1754689143727
     
    Completed in 1 second(s)

The neo4j-to-neptune tool generates the output as two files under the output/1754689143727 directory

ls -ltr output/1754689143727
total 3512
-rw-rw-r-- 1 ec2-user ec2-user  394459 Aug  8 21:39 vertices.csv
-rw-rw-r-- 1 ec2-user ec2-user 3195859 Aug  8 21:39 edges.csv

The vertices.csv file:

~id,~label,city:string,code:string,country:string,desc:string,elev:short,icao:string,id:short,label:string,lat:double,lon:double,longest:short,region:string,runways:byte,type:string
 
0,airport,Atlanta,ATL,US,KATL,1026,KATL,1,airport,33.6366996765137,-84.4281005859375,12390,US-GA,5,airport
1,airport,Anchorage,ANC,US,PANC,151,PANC,2,airport,61.1744003295898,-149.996002197266,12400,US-AK,3,airport
2,airport,Austin,AUS,US,KAUS,542,KAUS,3,airport,30.1944999694824,-97.6698989868164,12250,US-TX,2,airport
3,airport,Nashville,BNA,US,KBNA,599,KBNA,4,airport,36.1245002746582,-86.6781997680664,11030,US-TN,4,airport
4,airport,Boston,BOS,US,KBOS,19,KBOS,5,airport,42.36429977,-71.00520325,10083,US-MA,6,airport
5,airport,Baltimore,BWI,US,KBWI,143,KBWI,6,airport,39.17539978,-76.66829681,10502,US-MD,3,airport

The edges.csv file:

~id,~from,~to,~label,dist:short
e8192f62-d3e1-4e60-b8c8-00f674aa916d,0,2,route,809
d8c5ef77-ee84-4ebc-9b37-cdfe46e369fe,0,3,route,214
74ecb38f-4060-4bd6-94d5-c76717ff0d64,0,4,route,945

Output generated by the conversion utility has converted the Neo4j output file into a format supported by the Neptune bulk loader by adding an appropriate header with right comma separated nodes and edges. You are now ready to begin the process to import the data using the Neptune bulk loader. See Optimizing an Amazon Neptune bulk load to get the best performance during large bulk load operations.

Import the converted nodes and edges to Neptune database

Here are the high-level steps of the loading process:

  1. Copy the data files to an S3 bucket.
  2. Initiate a Neptune bulk load request by sending an HTTP request to the bulk loader API, passing the source file path in Amazon S3, and the bulk loader IAM role created in the prerequisites section.

Load vertices

From a command line window, enter the following to run the Neptune loader, using the correct values for your endpoint, S3 URI or path, format, and IAM role Amazon Resource Name (ARN). The format parameter can be any of the following values: csv for Gremlin, opencypher for openCypher, or ntriplesnquadsturtle, or rdfxml for RDF. For information about the other parameters, see Neptune Loader Command. The following command initiates a bulk load operation to load the vertices (nodes) data into the Neptune database using the Neptune loader API.

aws neptunedata start-loader-job \
  --source s3://neo4j-to-neptune-bucket/vertices.csv \
  --format csv \
  --s3-bucket-region us-east-1 \
  --iam-role-arn arn:aws:iam::XXXXXXXXXXX:role/NeptuneLoadFromS3 \
  --fail-on-error \
  --parallelism OVERSUBSCRIBE \
  --queue-request \
  --endpoint-url https://db-neptune.cluster-XXXXXXXXX.us-east-1.neptune.amazonaws.com:8182
{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "58eecccd-dd68-45b0-87ec-1a63f98a0aa1"
    }

The response returns a unique loadId that identifies this bulk load operation. Use this identifier to monitor the load status with the following command:

aws neptunedata get-loader-job-status \
   --load-id 58eecccd-dd68-45b0-87ec-1a63f98a0aa1 \
   --endpoint-url https://db-neptune.cluster-XXXXXXXXX.us-east-1.neptune.amazonaws.com:8182
{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_COMPLETED" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://neo4j-to-neptune-bucket/vertices.csv",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_COMPLETED",
            "totalTimeSpent" : 6,
            "startTime" : 1754924929,
            "totalRecords" : 54024,
            "totalDuplicates" : 0,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }
}

Let’s break down the components:

  • Command structure:
    • Uses curl with -G flag (GET request).
    • Queries the Neptune loader endpoint with a specific load ID (58eecccd-dd68-45b0-87ec-1a63f98a0aa1).
  • Response details:
    • fullUri: Source S3 location of the vertices.csv
    • filerunNumber: Current run number (1)
    • retryNumber: Number of retries (0)
    • status: Final status “LOAD_COMPLETED”
    • totalTimeSpent: 6 seconds
    • startTime: Unix timestamp of when the load started
    • totalRecords: 54,024 records processed
    • totalDuplicates: 0 duplicate records found
    • parsingErrors: 0 parsing errors
    • datatypeMismatchErrors: 0 data type mismatches
    • insertErrors: 0 insertion errors

This response indicates a successful load operation with no errors, where all 54,024 records from the vertices.csv file were successfully loaded into the Neptune database.

Load edges

The following command to initiates a bulk load operation to load the relationships (edges) data into the Neptune database using the Neptune loader API.

aws neptunedata start-loader-job \
  --source s3://neo4j-to-neptune-bucket/edges.csv \
  --format csv \
  --s3-bucket-region us-east-1 \
  --iam-role-arn arn:aws:iam::XXXXXXXXXXX:role/NeptuneLoadFromS3 \
  --fail-on-error \
  --parallelism OVERSUBSCRIBE \
  --queue-request \
  --endpoint-url https://db-neptune.cluster-XXXXXXXXX.us-east-1.neptune.amazonaws.com:8182
{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "3d58fe29-34f8-4c3b-b067-7c648447560a"
    }
}

The response returns a unique loadId that identifies this bulk load operation. Use this identifier to monitor the load status with the following command:

aws neptunedata get-loader-job-status \
   --load-id 3d58fe29-34f8-4c3b-b067-7c648447560a \
   --endpoint-url https://db-neptune.cluster-XXXXXXXXX.us-east-1.neptune.amazonaws.com:8182
{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_COMPLETED" : 1
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://neo4j-to-neptune-bucket/edges.csv",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_COMPLETED",
            "totalTimeSpent" : 12,
            "startTime" : 1754925120,
            "totalRecords" : 108282,
            "totalDuplicates" : 0,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }
}

Verify the loaded data by querying the Neptune database. You can use Neptune Jupyter Notebooks or the AWS CLI and Data API to run cypher commands to query the number of nodes and edges.

aws neptunedata execute-open-cypher-query \
   --open-cypher-query "MATCH (n) WITH count(n) as nodeCount MATCH ()-[r]->() WITH nodeCount, count(r) as edgeCount RETURN nodeCount, edgeCount " \
   --endpoint-url https://db-neptune-neo- XXXXXXXXX.us-east-1.neptune.amazonaws.com:8182
 
{
  "results": [
    {
      "nodeCount": 3784,
      "edgeCount": 57645
    }
  ]
}

Automated migration process

The neo4j-to-neptune utility supports an automated end-to-end process for migrating data from Neo4j to Neptune, which consists of two main steps:

  1. Export the CSV files from Neo4j using the APOC export procedures.
  2. Convert and load data to Neptune by transforming the CSV files into Neptune-supported format and performing a bulk load operation.

The convert-csv command supports an integrated bulk loading option that automatically uploads converted CSV files to Amazon S3 and initiates the Neptune bulk load process. You will use the CSV file that you exported in the previous section to demonstrate the automated conversion and bulk load process. There are two options to pass parameters to the neo4j-to-neptune.jar file:

  • Pass parameters as CLI parameters
  • Pass parameters as a YAML file

Option 1: Automated conversion and bulk load using CLI parameters

You can perform automated conversion and bulk loading using the following neo4j-to-neptune utility command with CLI parameters:

java -jar neo4j-to-neptune.jar convert-csv \
   -i neo4j-airroutes-export.csv \
   -d output \
   --s3-prefix neptune-data \
   --parallelism OVERSUBSCRIBE \
   --neptune-endpoint db-neptune-neo-migrate.cluster-clustername.us-east-1.neptune.amazonaws.com \
   --bucket-name neo4j-to-neptune-bucket \
   --iam-role-arn arn:aws:iam::XXXXXXXXXXXX:role/NeptuneLoadFromS3 \
   --infer-types

Let’s break down each parameter:

  1. java -jar neo4j-to-neptune.jar convert-csv: Executes the conversion utility
  2. Command parameters:
    • -i neo4j-airroutes-export.csv: Input file containing the Neo4j exported data
    • -d output: Directory where converted files will be stored
    • --s3-prefix neptune-data: Prefix for S3 bucket organization
    • --parallelism OVERSUBSCRIBE: Sets maximum parallelism for bulk loading
    • --neptune-endpoint: Specifies the Neptune cluster endpoint URL
    • --bucket-name neo4j-to-neptune-bucket: S3 bucket where converted files will be uploaded
    • --iam-role-arn: IAM role ARN that has permissions for S3 and Neptune access
    • --infer-types: Automatically infers data types for properties

The utility will:

  1. Convert the Neo4j CSV format to Neptune format.
  2. Automatically upload the converted files to the specified S3 bucket.
  3. Initiate a Neptune bulk load operation.
  4. Monitor the load progress if monitoring is enabled.
/home/ec2-user/java17/jdk-17.0.12/bin/java -jar neo4j-to-neptune.jar convert-csv \
>   -i neo4j-airroutes-export.csv \
>   -d output \
>   --s3-prefix neptune-data \
>   --parallelism OVERSUBSCRIBE \
>   --neptune-endpoint db-neptune-neo-migrate.cluster-c25vmpy64auu.us-east-1.neptune.amazonaws.com \
>   --bucket-name neo4j-to-neptune-bucket \
>   --iam-role-arn arn:aws:iam::XXXXXXXXXXXX:role/NeptuneLoadFromS3 \
>   --infer-types
Vertices: 3784
Edges   : 57645
Output  : output/1755284203609
output/1755284203609
 
Completed in 1 second(s)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
S3 Bucket: neo4j-to-neptune-migrate
S3 Prefix: neptune-data
AWS Region: us-east-1
IAM Role ARN: arn:aws:iam::606347354142:role/NeptuneLoadFromS3
Neptune Endpoint: db-neptune-neo-migrate.cluster-c25vmpy64auu.us-east-1.neptune.amazonaws.com
Bulk Load Parallelism: OVERSUBSCRIBE
Bulk Load Monitor: false
Uploading Gremlin load data to S3...
Starting sequential upload of files from /home/ec2-user/neptune/neo4j-to-neptune_old/amazon-neptune-tools/neo4j-to-neptune/neo4jtocsv/output/1755284203609 to s3://neo4j-to-neptune-migrate/neptune-data/1755284203609
Uploading file 1 of 2: vertices.csv
Starting async upload of /home/ec2-user/neptune/neo4j-to-neptune_old/amazon-neptune-tools/neo4j-to-neptune/neo4jtocsv/output/1755284203609/vertices.csv to s3://neo4j-to-neptune-migrate/neptune-data/1755284203609/vertices.csv
File size: 394459 Bytes
Using S3 Transfer Manager for upload...
Initiating Transfer Manager upload...
Successfully uploaded vertices.csv using Transfer Manager - ETag: "b6112e54a4f8041c42371073cdddd80c"
Successfully uploaded vertices.csv (1/2)
Uploading file 2 of 2: edges.csv
Starting async upload of /home/ec2-user/neptune/neo4j-to-neptune_old/amazon-neptune-tools/neo4j-to-neptune/neo4jtocsv/output/1755284203609/edges.csv to s3://neo4j-to-neptune-migrate/neptune-data/1755284203609/edges.csv
File size: 3195859 Bytes
Using S3 Transfer Manager for upload...
Initiating Transfer Manager upload...
Successfully uploaded edges.csv using Transfer Manager - ETag: "161d448d860ff177741aff39dd56c773"
Successfully uploaded edges.csv (2/2)
Successfully uploaded all 2 files sequentially
Files uploaded successfully to S3. Files available at: s3://neo4j-to-neptune-migrate/neptune-data/1755284203609/
Starting Neptune bulk load...
Testing connectivity to Neptune endpoint...
Successful connected to Neptune. Status: 200 healthy
Neptune bulk load started successfully! Load ID: 31e1c3dc-cb21-4143-a0da-437094a6aa09

The response returns a unique load ID that identifies this bulk load operation. Use this identifier to monitor the load status with the following command:

aws neptunedata get-loader-job-status \
   --load-id 31e1c3dc-cb21-4143-a0da-437094a6aa09 \
   --endpoint-url https://db-neptune.cluster-XXXXXXXXX.us-east-1.neptune.amazonaws.com:8182
{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [
            {
                "LOAD_COMPLETED" : 2
            }
        ],
        "overallStatus" : {
            "fullUri" : "s3://neo4j-to-neptune-bucket/neptune-data/1755284203609/",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_COMPLETED",
            "totalTimeSpent" : 14,
            "startTime" : 1755284207,
            "totalRecords" : 162306,
            "totalDuplicates" : 0,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }
}

Let’s analyze the response:

  • Command structure:
    • Uses curl with -G flag to make a GET request.
    • Queries the Neptune loader endpoint with a specific load ID (31e1c3dc-cb21-4143-a0da-437094a6aa09).
  • Response analysis:
    • fullUri: Shows S3 location of the loaded files
    • runNumber: 1 (first run)
    • retryNumber: 0 (no retries needed)
    • status: "LOAD_COMPLETED" (successful completion)
    • totalTimeSpent: 14 seconds
    • startTime: Unix timestamp 1755284207
    • totalRecords: 162,306 total records processed (combined vertices and edges)
    • totalDuplicates: 0 (no duplicate records found)
    • parsingErrors: 0 (no parsing errors)
    • datatypeMismatchErrors: 0 (no data type mismatches)
    • insertErrors: 0 (no insertion errors)

This response indicates a completely successful load operation where both the vertices and edges files were loaded without any errors, processing a total of 162,306 records. If you have observed any error during the load operation, you can use Neptune loader Get-Status Example to get more details.

You can verify the Neptune Graph by running this script in a Neptune Jupyter Notebook environment using the openCypher (%%oc) magic command or AWS CLI and Data API. The query performs two match operations—first counting all nodes (n) in the graph, then counting all relationships (r) between nodes. The results show that the Neptune database contains 3,748 nodes and 57,645 edges, which verifies that all the data from the Neo4j database was successfully migrated to Neptune.

aws neptunedata execute-open-cypher-query \
   --open-cypher-query "MATCH (n) WITH count(n) as nodeCount MATCH ()-[r]->() WITH nodeCount, count(r) as edgeCount RETURN nodeCount, edgeCount " \
   --endpoint-url https://db-neptune-neo- XXXXXXXXX.us-east-1.neptune.amazonaws.com:8182
 
{
  "results": [
    {
      "nodeCount": 3784,
      "edgeCount": 57645
    }
  ]
}

Option 2: Automated conversion and bulk load using a YAML configuration file

Create a YAML configuration file named bulk-load-config.yaml with your bulk load settings.

# S3 Configuration
bucket-name: "neo4j-to-neptune-bucket"
s3-prefix: "neptune-data"
 
# Neptune Configuration
neptune-endpoint: "db-neptune-neo-migrate-instance-1.clustername.us-east-1.neptune.amazonaws.com "
 
# IAM Configuration
iam-role-arn: "arn:aws:iam::XXXXXXXXXXXX:role/NeptuneLoadFromS3Role"
 
# Performance Settings
parallelism: "OVERSUBSCRIBE"  # Options: LOW, MEDIUM, HIGH, OVERSUBSCRIBE
 
# Monitoring
monitor: true
 

The following command uses the neo4j-to-neptune utility with a YAML configuration file approach instead of command-line parameters.

java -jar neo4j-to-neptune.jar convert-csv \
-i neo4j-airroutes-export.csv \
-d output --bulk-load-config bulk-load-config.yaml

Let’s look at the whole process.

Command components:

  • java -jar neo4j-to-neptune.jar convert-csv: Executes the conversion utility
  • -i neo4j-airroutes-export.csv: Specifies the input Neo4j export file
  • -d output: Defines the output directory for converted files
  • --bulk-load-config bulk-load-config.yaml: Points to a YAML configuration file

Run the following command from the EC2 instance where the CSV file and neo4j-to-neptune.jar were copied:

/home/ec2-user/java17/jdk-17.0.12/bin/java -jar neo4j-to-neptune.jar convert-csv \
>   -i neo4j-airroutes-export.csv \
>   -d output \
>   --bulk-load-config bulk-load-config.yaml
Vertices: 3784
Edges   : 57645
Output  : output/1755283074036
output/1755283074036
 
Completed in 1 second(s)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
S3 Bucket: neo4j-to-neptune-migrate
S3 Prefix: neptune-data
AWS Region: us-east-1
IAM Role ARN: arn:aws:iam::606347354142:role/NeptuneLoadFromS3
Neptune Endpoint: db-neptune-neo-migrate.cluster-c25vmpy64auu.us-east-1.neptune.amazonaws.com
Bulk Load Parallelism: OVERSUBSCRIBE
Bulk Load Monitor: true
Uploading Gremlin load data to S3...
Starting sequential upload of files from /home/ec2-user/neptune/neo4j-to-neptune_old/amazon-neptune-tools/neo4j-to-neptune/neo4jtocsv/output/1755283074036 to s3://neo4j-to-neptune-migrate/neptune-data/1755283074036
Uploading file 1 of 2: vertices.csv
Starting async upload of /home/ec2-user/neptune/neo4j-to-neptune_old/amazon-neptune-tools/neo4j-to-neptune/neo4jtocsv/output/1755283074036/vertices.csv to s3://neo4j-to-neptune-migrate/neptune-data/1755283074036/vertices.csv
File size: 394366 Bytes
Using S3 Transfer Manager for upload...
Initiating Transfer Manager upload...
Successfully uploaded vertices.csv using Transfer Manager - ETag: "829888b668436f8dca506ae843d7f64a"
Successfully uploaded vertices.csv (1/2)
Uploading file 2 of 2: edges.csv
Starting async upload of /home/ec2-user/neptune/neo4j-to-neptune_old/amazon-neptune-tools/neo4j-to-neptune/neo4jtocsv/output/1755283074036/edges.csv to s3://neo4j-to-neptune-migrate/neptune-data/1755283074036/edges.csv
File size: 3195853 Bytes
Using S3 Transfer Manager for upload...
Initiating Transfer Manager upload...
Successfully uploaded edges.csv using Transfer Manager - ETag: "445be20620b7361164b43217a6edf4e7"
Successfully uploaded edges.csv (2/2)
Successfully uploaded all 2 files sequentially
Files uploaded successfully to S3. Files available at: s3://neo4j-to-neptune-migrate/neptune-data/1755283074036/
Starting Neptune bulk load...
Testing connectivity to Neptune endpoint...
Successful connected to Neptune. Status: 200 healthy
Neptune bulk load started successfully! Load ID: 49a81941-8ed9-415c-b22c-d791e56338e4
Monitoring load progress for job: 49a81941-8ed9-415c-b22c-d791e56338e4
Neptune bulk load status: LOAD_IN_PROGRESS
Neptune bulk load status: LOAD_IN_PROGRESS
Neptune bulk load completed with status: LOAD_COMPLETED

What the process includes:

  1. The utility validates all bulk load parameters. If any required parameters are missing or invalid, the conversion will be aborted with a clear error message indicating which parameters are missing.
  2. Converts the Neo4j CSV file to the Neptune Cypher format.
  3. Automatically uploads the converted files to Amazon S3.
  4. Initiates a Neptune bulk load job.
  5. Monitors the progress until completion (if monitoring is enabled).

You can verify the Neptune Graph by running this script in a Neptune Jupyter Notebook environment using the openCypher (%%oc) magic command or AWS CLI and Data API. The query performs two match operations—first counting all nodes (n) in the graph, then counting all relationships (r) between nodes. The results show that the Neptune graph database contains 3,748 nodes and 57,645 edges, which verifies that all the data from Neo4j was successfully migrated to Neptune.

aws neptunedata execute-open-cypher-query \
   --open-cypher-query "MATCH (n) WITH count(n) as nodeCount MATCH ()-[r]->() WITH nodeCount, count(r) as edgeCount RETURN nodeCount, edgeCount " \
   --endpoint-url https://db-neptune-neo- XXXXXXXXX.us-east-1.neptune.amazonaws.com:8182
 
{
  "results": [
    {
      "nodeCount": 3784,
      "edgeCount": 57645
    }
  ]
}

Clean up

To avoid incurring unnecessary charges, make sure to delete the resources you created if you no longer need them. All the following cleanup commands should be run from your local machine or EC2 instance where you have the AWS CLI installed and configured with appropriate credentials. Make sure you have the necessary permissions to delete these resources before proceeding.

  1. Delete Neptune resources
  2. First, delete your Neptune database instance:
    aws neptune delete-db-instance \
        --db-instance-identifier <your-db-instance-identifier> \
        --skip-final-snapshot 
  3. Delete your Neptune database cluster:
    aws neptune delete-db-cluster \
        --db-cluster-identifier <your-cluster-identifier> \
        --skip-final-snapshot
  4. Delete S3 objects (data files) and bucket:
    # Delete Neptune data files
    aws s3 rm s3://<your-bucket-name>/neptune-data/ --recursive
    # Delete the bucket if no longer needed
    aws s3 rb s3://<your-bucket-name> --force
  5. Clean up IAM resources by detaching the role policies and then removing the IAM role created for the Neptune bulk load:
    # First detach the policies
    aws iam detach-role-policy \
        --role-name NeptuneLoadFromS3 \
        --policy-arn arn:aws:iam::aws:policy/service-role/AmazonNeptuneFullAccess
     
    aws iam detach-role-policy \
        --role-name NeptuneLoadFromS3 \
        --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
     
    # Then delete the role
    aws iam delete-role --role-name NeptuneLoadFromS3
  6. If you used an EC2 instance, you can terminate the instance using the following command:
    aws ec2 terminate-instances --instance-ids yourinstanceid
  7. Remove local files created during the migration process:
    # Remove converted CSV files
    rm -rf output/
     
    # Remove the exported Neo4j CSV file if no longer needed
    rm neo4j-airroutes-export.csv

Conclusion

The Neo4j to Neptune migration tool provides a robust and flexible solution for transferring data between these graph databases, offering multiple migration paths so that you can choose between a manual multi-step process or an automated end-to-end solution depending on your specific needs. The tool features extensive configuration options through both command-line parameters and YAML files, enabling fine-tuned control over the migration process, while sophisticated handling of multi-valued properties helps ensure data integrity with configurable policies for both nodes and relationships. Integration with the Neptune bulk loader simplifies the upload process through automatic Amazon S3 file management and load monitoring, and the tool performs validation checks before starting the conversion process to help prevent failed migrations because of misconfiguration. This utility streamlines the complex process of migrating from Neo4j to Neptune while providing the flexibility and control needed for enterprise-grade data migrations.

Get started today by downloading the neo4j-to-neptune utility from the AWS GitHub repository and following either the manual or automated migration approach based on your needs. Whether you’re planning a small-scale migration or an enterprise-level transformation, the neo4j-to-neptune utility provides the flexibility and control you need for a successful migration. Visit our GitHub repository to access the migration tools, begin your transition to Neptune, and experience the benefits of the fully managed graph database service it provides.


About the authors

Justin John

Justin John

Justin is a Database Specialist Solutions Architect with AWS. Prior to AWS, Justin worked for large enterprises in technology, financials, airlines, and energy verticals, helping them with database and solution architectures. In his spare time, you’ll find him exploring the great outdoors through hiking and traveling to new destinations.

autDave Bechberger

Dave Bechberger

Dave is a Sr. Graph Architect with the Amazon Neptune team. He used his years of experience working with customers to build graph database-backed applications as inspiration to co-author Graph Databases in Action by Manning.