Copy data from MySQL to Amazon DynamoDB

I have a table in MySQL containing 500 million records. I want to import this table into Amazon DynamoDB. I understand that there are two ways to do this:

  • JAVA Api: The problem with this approach is that it is slow, sometimes the database connection is sometimes dropped.

  • Amazon Data Pip Pipeline: seems promising, but how can I export data from MySQL to a format recognized by DynamoDB?

Please give me the best possible approach between them.

+5
source share
3 answers

AWS has two services that can help you complete this operation.

  • Data pipeline
  • EMR cluster with beehive

Data pipeline

A very simple way - if your "schemes" are similar (I’m always embarrassed to talk about a scheme for DynamoDB) - this will export from MySQL to S3 and then import from S3 to DynamoDB.

There are two tutorials in Data Pipeline to help you set up your tasks.

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-copydata-mysql.html http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1 .html

You can further improve this process by developing one pipeline that imports and exports. If you need to convert data between import and export, you will need to develop the conversion code and execute it from the pipeline.

In terms of Data Pipeline, this is an Activity call. Activity can be as simple as a shell script or as complex as a Hive / Hadoop / Pig application running on EMR closer. http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts-activities.html

Data Pipeline will also allow you to plan your execution at regular intervals.

Hive and EMR

Hive is a hadoop tool for writing SQL commands for managing data sources. Hive translates SQL into a Hadoop application that runs on the cluster. You can run Hive on the AWS Elastic Map Reduce Cluster (hasoop managed service cluster).

Hive on EMR can connect to non-related data sources, such as S3 files or DynamoDB database. It allows you to write SQL statements on top of DynamoDB!

In your use case, you need to write a Hive script that will read from MySQL and write to DynamoDB. You can transform data using standard (Hive) SQL expressions.

More on Hive on EMR: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive.html

More about DynamoDB and Hive: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Walkthrough.html http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/EMRforDynamoDB.html

+8
source

In addition to another answer, I would like to mention that dynamodb recognizes csv or tsv format csv for import. We can also use the sql module using the Elastic Map Reduce to bulk loading data from a csv file. The only thing we need to consider is if we use windows to dump the table to csv , then we need to make sure that the line ending of the windows \r\n must be replaced with \n to make it compatible with amazon.

0
source

I found the easiest way for me is to write a script to transfer all the information to the json file in the format indicated here: Download AWS data

 { "ProductCatalog": [ { "PutRequest": { "Item": { "Id": { "N": "101" }, "Title": { "S": "Book 101 Title" }, "ISBN": { "S": "111-1111111111" }, "Authors": { "L": [ { "S": "Author1" } ] }, "Price": { "N": "2" }, "Dimensions": { "S": "8.5 x 11.0 x 0.5" }, "PageCount": { "N": "500" }, "InPublication": { "BOOL": true }, "ProductCategory": { "S": "Book" } } } }, { "PutRequest": { "Item": { "Id": { "N": "103" }, "Title": { "S": "Book 103 Title" }, "ISBN": { "S": "333-3333333333" }, "Authors": { "L": [ { "S": "Author1" }, { "S": "Author2" } ] }, "Price": { "N": "2000" }, "Dimensions": { "S": "8.5 x 11.0 x 1.5" }, "PageCount": { "N": "600" }, "InPublication": { "BOOL": false }, "ProductCategory": { "S": "Book" } } } }, { "PutRequest": { "Item": { "Id": { "N": "205" }, "Title": { "S": "18-Bike-204" }, "Description": { "S": "205 Description" }, "BicycleType": { "S": "Hybrid" }, "Brand": { "S": "Brand-Company C" }, "Price": { "N": "500" }, "Color": { "L": [ { "S": "Red" }, { "S": "Black" } ] }, "ProductCategory": { "S": "Bicycle" } } } } ] } 

and then create tables and run the code from my console

 aws dynamodb batch-write-item --request-items file://ProductCatalog.json 

To download and configure aws cli: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.CLI.html

0
source

Source: https://habr.com/ru/post/1211753/


All Articles