Importing data to dynamoDB from S3 (using AWS Data Pipeline)

You will have to have an S3 location first, let's say a directory 'X'.

The directory 'X' from which the import will happen should contain below files:
a.       manifest
b.       your-file-here.txt (the one containing the actual data)

your-file-here.txt will contain the data in JSON format, one per line.

Go to dynamoDB, select your table by clicking on it. Under 'Actions', hit 'import data'. Create a pipeline and activate it, but before activating, consider below learnings about your data.

Learnings when importing data to DynamoDB (from S3 file, using data pipeline):
1.       Replace \ with \\
2.       No field value should be empty
3.       Each line should independently be a valid json object. Any line should NOT end in a comma.
4.       The file should be JSON verified using bash command: cat <file-name> | python -m json.tool
Note that the file may need to be converted to a full json object first, by appending comma at the end of each line, and appending {“object”: [ at the beginning of the file, and }] at the end of the file. These should be removed before uploading the file to S3.
5.       A sample manifest file content is below:
{"name":"DynamoDB-export","version":3,
"entries": [
{"url":"s3://your-preferred-location/your-file-here.txt","mandatory":true}
]}



Feel free to add more through comments.

Comments

  1. This comment has been removed by the author.

    ReplyDelete
  2. It’s really interesting content and nice post AWS online Training

    ReplyDelete

Post a Comment

Popular posts from this blog

C Graph implementation with adjacency list representation using structures: Data Structure Tutorial 1

Interview with ACRush (Tiancheng Lou), the coding sensation!

Accessing DynamoDB from Spring Java application - Walkthrough tutorial