How do I resolve processing errors in Amazon Neptune Bulk Loader?
Last updated: 2020-10-06
I'm trying to use Amazon Neptune Bulk Loader to load data from an Amazon Simple Storage Service (Amazon S3) bucket. However, some of the requests fail. How do I troubleshoot this?
Short description
To troubleshoot data requests that keep failing, check the status of each job. Then, identify the failed jobs by doing the following:
- Use the default Bulk Loader API for each individual load and check each job's status.
- Use an admin script and an automated script in one job. You can create and run the automated script on a Linux or UNIX system.
Note these limitations:
- The Neptune Bulk Loader API doesn't provide a snapshot view of all load operations.
- If AWS Identity and Access Management (IAM) authorization is enabled on the Neptune cluster, then the requests to the Bulk Load API must be signed.
- The Bulk Loader API caches information only on the last 1024 load jobs. It only stores error details for the last 10,000 errors per job.
Resolution
Use the default Bulk Loader API
1. Retrieve the loader IDs:
$ curl -G 'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader'|jq
{
"status": "200 OK",
"payload": {
"loadIds": [
"c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
"6f6342fb-4ea3-452c-ac69-b4d117e37d5a",
"647114a6-6ed4-4018-896c-e84a08fcf864",
"521d33fa-7050-44d7-a961-b64ef4e2d1db",
"d0d4714e-7cf8-415e-89f5-d07ed2732bf2"
]
}
}
2. Check each job's status, one by one, to verify that the job was successful:
curl -G 'https://neptunedemo-cluster.cluster-cw7ehemc1eeo.us-east-1.neptune.amazonaws.com:8182/loader/c32bbd24-99a7-45ee-972c-21b7b9cab3e2?details=true&errors=true&page=1&errorsPerPage=3'|jq
{
"status": "200 OK",
"payload": {
"feedCount": [
{
"LOAD_COMPLETED": 2
}
],
"overallStatus": {
"fullUri": "s3://demodata/neptune/",
"runNumber": 5,
"retryNumber": 0,
"status": "LOAD_COMPLETED",
"totalTimeSpent": 3,
"startTime": 1555574461,
"totalRecords": 8,
"totalDuplicates": 8,
"parsingErrors": 0,
"datatypeMismatchErrors": 0,
"insertErrors": 0
},
"errors": {
"startIndex": 0,
"endIndex": 0,
"loadId": "c32bbd24-99a7-45ee-972c-21b7b9cab3e2",
"errorLogs": []
}
}
}
Use an admin script
You can use an admin script to identify a failed Neptune Bulk Loader job in your production process. The admin script generates an output in the following format for all load jobs:
Startime-loadid:status,S3location,Errors
Note: The admin script can be used from any Linux system that has access to the Neptune cluster.
Create and run the automated script on a Linux or UNIX system
1. Create the script using a text editor:
$ vi script
2. Be sure that you replace cluster-endpoint:Port with the appropriate values:
cluster_ep="https://cluster-endpoint:Port/loader"
for loadId in $(curl --silent -G "${cluster_ep}?details=true" | jq '.payload.loadIds[]');
do
clean_loadId=$(echo -n ${loadId} | tr -d '"')
time=$(date -d@$(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.startTime'))
echo -n $time '-'
echo -n ${clean_loadId}: $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.status')
echo -n ',S3 LOCATION': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=true" | jq '.payload.overallStatus.fullUri')
echo -n ',ERRORS': $(curl --silent -G "${cluster_ep}/${clean_loadId}?details=truei&errors=true&page=1&errorsPerPage=3" | jq '.payload.errors.errorLogs')
echo
done
3. Save the script, and then provide permissions for the script to run:
chmod +x script
4. Install the dependent library:
sudo yum install jq
5. Run the script:
$ ./script
This is example output:
Thu Apr 18 08:01:01 UTC 2019 -c32bbd24-99a7-45ee-972c-21b7b9cab3e2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Fri Apr 5 07:04:00 UTC 2019 -6f6342fb-4ea3-452c-ac69-b4d117e37d5a: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Fri Apr 5 07:01:30 UTC 2019 -647114a6-6ed4-4018-896c-e84a08fcf864: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Tue Mar 19 17:36:02 UTC 2019 -521d33fa-7050-44d7-a961-b64ef4e2d1db: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Tue Mar 19 17:35:45 UTC 2019 -d0d4714e-7cf8-415e-89f5-d07ed2732bf2: "LOAD_COMPLETED",S3 LOCATION: "s3://demodata/neptune/",ERRORS: null
Related information
Did this article help?
Do you need billing or technical support?