Yes, hive uses SQL syntax. The hive is still written in java, and under the hood it is still java. Hive wiki is a good place to start. Here is a good article on using Dynamo DB with EMR http://aws.amazon.com/articles/28549
If my data is less than 50 GB, is EMR used to overload for this task?
I donβt think so if you have an EMR setup and exported a dynamo table to s3 or to an internal chaos table. You can then query S3 or the internal hadoop table without affecting DynamoDB's temporary bandwidth. Since S3 is very fast, you can write all kinds of complex hive requests to get the reports you need.
The EMR launch command line tool is very easy to set up, and if you want to save money, you can always bet on samples.
Also, when the task is running slowly, you can increase the core and nodes of the task to quickly complete the task if you want.
source share