Importing data to dynamoDB from S3 (using AWS Data Pipeline)

You will have to have an S3 location first, let's say a directory 'X'.

The directory 'X' from which the import will happen should contain below files:
a.       manifest
b.       your-file-here.txt (the one containing the actual data)

your-file-here.txt will contain the data in JSON format, one per line.

Go to dynamoDB, select your table by clicking on it. Under 'Actions', hit 'import data'. Create a pipeline and activate it, but before activating, consider below learnings about your data.

Learnings when importing data to DynamoDB (from S3 file, using data pipeline):
1.       Replace \ with \\
2.       No field value should be empty
3.       Each line should independently be a valid json object. Any line should NOT end in a comma.
4.       The file should be JSON verified using bash command: cat <file-name> | python -m json.tool
Note that the file may need to be converted to a full json object first, by appending comma at the end of each line, and appending {“object”: [ at the beginning of the file, and }] at the end of the file. These should be removed before uploading the file to S3.
5.       A sample manifest file content is below:
{"name":"DynamoDB-export","version":3,
"entries": [
{"url":"s3://your-preferred-location/your-file-here.txt","mandatory":true}
]}



Feel free to add more through comments.

Comments

Post a Comment

Popular posts from this blog

C Graph implementation with adjacency list representation using structures: Data Structure Tutorial 1

Roadies X Winner is Palak Johal!!

Deep learning resources and notes