DynamoDB InputFormat for Hadoop

I need to process some data that is stored in an Amazon Dynamo DB using a Hadoop card.

I searched the Internet for Hadoop InputFormat for DB Dynamo and could not find it. I am not familiar with Dynamo DB, so I assume there are some tricks related to DynamoDB and Hadoop? If there is any implementation of this input format, can you share it?

+3
source share
2 answers

After a lot of searching, I found DynamoDBInputFormat and DynamoDBOutputFormat in one of the Amazon libraries.

There is a library on the Amazon's smaller elastic map called a hive-bigbird handler that contains the input and output format for dynamoDB. Full class names: org.apache.hadoop.hive.dynamodb.write.DynamoDBOutputFormat and org.apache.hadoop.hive.dynamodb.read.DynamoDBInputFormat

I hope these classes will be useful to the community.

+3
source

Could not find an InputFormat that can be used directly in MapReduce. But here is an AWS HowTo article : Using Amazon Elastic MapReduce with DynamoDB (guest post) to run MarReduce jobs using Hive.

+1
source

All Articles