DynamoDB InputFormat for Hadoop

Question

DynamoDB InputFormat for Hadoop

I need to process some data that is stored in an Amazon Dynamo DB using a Hadoop card.

I searched the Internet for Hadoop InputFormat for DB Dynamo and could not find it. I am not familiar with Dynamo DB, so I assume there are some tricks related to DynamoDB and Hadoop? If there is any implementation of this input format, can you share it?

+3

amazon-web-services elastic-map-reduce amazon-dynamodb mapreduce hadoop

dino.keco Oct 22 '12 at 21:22

source share

2 answers

Could not find an InputFormat that can be used directly in MapReduce. But here is an AWS HowTo article : Using Amazon Elastic MapReduce with DynamoDB (guest post) to run MarReduce jobs using Hive.

+1

Praveen sripati Oct 23 '12 at 5:02

source share

dino.keco · Accepted Answer · 2012-10-29T18:36:22+0000

After a lot of searching, I found DynamoDBInputFormat and DynamoDBOutputFormat in one of the Amazon libraries.

There is a library on the Amazon's smaller elastic map called a hive-bigbird handler that contains the input and output format for dynamoDB. Full class names: org.apache.hadoop.hive.dynamodb.write.DynamoDBOutputFormat and org.apache.hadoop.hive.dynamodb.read.DynamoDBInputFormat

I hope these classes will be useful to the community.

DynamoDB InputFormat for Hadoop

More articles: