How do I resolve processing errors in Amazon Neptune Bulk Loader?

3 minute read

I'm trying to use Amazon Neptune Bulk Loader to load data from an Amazon Simple Storage Service (Amazon S3) bucket. However, some of the requests fail. How do I troubleshoot this?

Short description

To troubleshoot data requests that keep failing, check the status of each job. Then, identify the failed jobs by doing the following:

Use the default Bulk Loader API for each individual load and check each job's status.
Use an admin script and an automated script in one job. You can create and run the automated script on a Linux or UNIX system.

Note these limitations:

The Neptune Bulk Loader API doesn't provide a snapshot view of all load operations.
If AWS Identity and Access Management (IAM) authorization is enabled on the Neptune cluster, then the requests to the Bulk Load API must be signed.
The Bulk Loader API caches information only on the last 1024 load jobs. It only stores error details for the last 10,000 errors per job.

Resolution

Use the default Bulk Loader API

1. Retrieve the loader IDs:

$ curl -G  'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader'|jq
{
  "status": "200 OK",
  "payload": {
    "loadIds": [
      "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
      "6f6342fb-4ea3-452c-ac69-b4d117e37d5a",
      "647114a6-6ed4-4018-896c-e84a08fcf864",
      "521d33fa-7050-44d7-a961-b64ef4e2d1db",
      "d0d4714e-7cf8-415e-89f5-d07ed2732bf2"
    ]
  }
}

2. Check each job's status, one by one, to verify that the job was successful:

curl -G 'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader/c32bbd24-99a7-45ee-972c-21b7b9cab3e2?details=true&errors=true&page=1&errorsPerPage=3'|jq
{
  "status": "200 OK",
  "payload": {
    "feedCount": [
      {
        "LOAD_COMPLETED": 2
      }
    ],
    "overallStatus": {
      "fullUri": "s3://demodata/neptune/",
      "runNumber": 5,
      "retryNumber": 0,
      "status": "LOAD_COMPLETED",
      "totalTimeSpent": 3,
      "startTime": 1555574461,
      "totalRecords": 8,
      "totalDuplicates": 8,
      "parsingErrors": 0,
      "datatypeMismatchErrors": 0,
      "insertErrors": 0
    },
    "errors": {
      "startIndex": 0,
      "endIndex": 0,
      "loadId": "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
      "errorLogs": []
    }
  }
}

Use an admin script

You can use an admin script to identify a failed Neptune Bulk Loader job in your production process. The admin script generates an output in the following format for all load jobs:

Startime-loadid:status,S3location,Errors

Note: The admin script can be used from any Linux system that has access to the Neptune cluster.

Create and run the automated script on a Linux or UNIX system

1. Create the script using a text editor:

$ vi script

2. Be sure that you replace cluster-endpoint:Port with the appropriate values:

cluster_ep="https://cluster-endpoint:Port/loader"

for loadId in $(curl --silent -G "${cluster_ep}?details=true" | jq '.payload.loadIds[]');
do
        clean_loadId=$(echo -n ${loadId} | tr -d '"')
        time=$(date -d@$(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.startTime'))
        echo -n $time '-'
        echo -n ${clean_loadId}: $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.status')
        echo -n ',S3 LOCATION': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.fullUri')
        echo -n ',ERRORS': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=truei&errors=true&page=1&errorsPerPage=3" | jq '.payload.errors.errorLogs')

        echo
done

3. Save the script, and then provide permissions for the script to run:

chmod +x script

4. Install the dependent library:

sudo yum install jq

5. Run the script:

$ ./script

This is example output:

Thu Apr 18 08:01:01 UTC 2019 -c32bbd24-99a7-45ee-972c-21b7b9cab3e2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Fri Apr 5 07:04:00 UTC 2019 -6f6342fb-4ea3-452c-ac69-b4d117e37d5a: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Fri Apr 5 07:01:30 UTC 2019 -647114a6-6ed4-4018-896c-e84a08fcf864: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Tue Mar 19 17:36:02 UTC 2019 -521d33fa-7050-44d7-a961-b64ef4e2d1db: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Tue Mar 19 17:35:45 UTC 2019 -d0d4714e-7cf8-415e-89f5-d07ed2732bf2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null

Related information

Example: Loading data into a Neptune DB instance

Neptune Loader Get-Status API

Topics

Database

Relevant content

Does Neptune execute bulk loader jobs in parallel or serially?
mveilleux
asked 4 years ago
Neptune Bulk Loader PARSING_ERROR when turtle contains empty array
xocliwtb
asked 4 years ago
Neptune Loader throws LOAD_FAILED error
ot0r1
asked a year ago
Neptune Loader "Load ID: 2132...., Overall Status: Load_Failed"
Accepted Answer
HB
asked a year ago
Not able to get "status" : "200 OK" for Bulk Load API Call using aws neptune
rePost-User-8637364
asked a year ago
How do I resolve the "failed to initialize logging driver: failed to create CloudWatch log stream status code: 400" error when I run an AWS Batch job?
AWS OFFICIALUpdated 2 years ago
How do I resolve Amazon S3 AccessDenied errors in Amazon SageMaker training jobs?
AWS OFFICIALUpdated 2 years ago
How do I decrease the load time for Neptune Bulk Loader?
AWS OFFICIALUpdated 4 years ago
How do I troubleshoot the pod status in Amazon EKS?
AWS OFFICIALUpdated 4 months ago
How do I pause a queue and cancel thousands of jobs in that queue?
EXPERT
Bo_L
published 6 months ago