How do I resolve processing errors in Amazon Neptune Bulk Loader?

Last updated: 2020-05-06

I'm trying to use Amazon Neptune Bulk Loader to load data from an Amazon Simple Storage Service (Amazon S3) bucket. However, some of the requests fail. How do I troubleshoot this?

Short Description

To troubleshoot data requests that keep failing, you can check the status of each job. Then, identify the failed jobs by doing the following:

  • Use the default Bulk Loader API for each individual load and check each job's status.
  • Use an admin script and an automated script in one job. You can create and execute the automated script on a Linux or UNIX system.

Note these limitations:

  • The Neptune Bulk Loader API doesn't provide a snapshot view of all the load operations.
  • If AWS Identity and Access Management (IAM) authorization is enabled on the Neptune cluster, then the requests to the Bulk Load API must be signed.
  • The Bulk Loader API only caches information on the last 1024 load jobs. It only stores error details for the last 10,000 errors per job.

Resolution

Use the default Bulk Loader API

1.    Retrieve the loader IDs:

$ curl -G  'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader'|jq
{
  "status": "200 OK",
  "payload": {
    "loadIds": [
      "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
      "6f6342fb-4ea3-452c-ac69-b4d117e37d5a",
      "647114a6-6ed4-4018-896c-e84a08fcf864",
      "521d33fa-7050-44d7-a961-b64ef4e2d1db",
      "d0d4714e-7cf8-415e-89f5-d07ed2732bf2"
    ]
  }
}

2.    Check each job's status, one by one, to verify that it was successful:

curl -G 'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader/c32bbd24-99a7-45ee-972c-21b7b9cab3e2?details=true&errors=true&page=1&errorsPerPage=3'|jq
{
  "status": "200 OK",
  "payload": {
    "feedCount": [
      {
        "LOAD_COMPLETED": 2
      }
    ],
    "overallStatus": {
      "fullUri": "s3://demodata/neptune/",
      "runNumber": 5,
      "retryNumber": 0,
      "status": "LOAD_COMPLETED",
      "totalTimeSpent": 3,
      "startTime": 1555574461,
      "totalRecords": 8,
      "totalDuplicates": 8,
      "parsingErrors": 0,
      "datatypeMismatchErrors": 0,
      "insertErrors": 0
    },
    "errors": {
      "startIndex": 0,
      "endIndex": 0,
      "loadId": "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
      "errorLogs": []
    }
  }
}

Use an admin script

You can use an admin script to identify a failed Neptune Bulk Loader job in your production process. The admin script generates an output in the following format for all load jobs:

Startime-loadid:status,S3location,Errors

Note: The admin script can be used from any Linux system that has access to the Neptune cluster.

Create and execute the automated script on a Linux or UNIX system

1.    Create the script using a text editor:

$ vi script

2.    Be sure that you replace cluster-endpoint:Port with the appropriate values:

cluster_ep="https://cluster-endpoint:Port/loader"

for loadId in $(curl --silent -G "${cluster_ep}?details=true" | jq '.payload.loadIds[]');
do
        clean_loadId=$(echo -n ${loadId} | tr -d '"')
        time=$(date -d@$(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.startTime'))
        echo -n $time '-'
        echo -n ${clean_loadId}: $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.status')
        echo -n ',S3 LOCATION': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.fullUri')
        echo -n ',ERRORS': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=truei&errors=true&page=1&errorsPerPage=3" | jq '.payload.errors.errorLogs')

        echo
done

3.    Save the script and provide execute permissions:

chmod +x script

4.    Install the dependent library:

sudo yum install jq 

5.    Execute the script:

$ ./script

Here is an example output:

Thu Apr 18 08:01:01 UTC 2019 -c32bbd24-99a7-45ee-972c-21b7b9cab3e2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Fri Apr 5 07:04:00 UTC 2019 -6f6342fb-4ea3-452c-ac69-b4d117e37d5a: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Fri Apr 5 07:01:30 UTC 2019 -647114a6-6ed4-4018-896c-e84a08fcf864: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Tue Mar 19 17:36:02 UTC 2019 -521d33fa-7050-44d7-a961-b64ef4e2d1db: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Tue Mar 19 17:35:45 UTC 2019 -d0d4714e-7cf8-415e-89f5-d07ed2732bf2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null

Did this article help you?

Anything we could improve?


Need more help?