First of all, I'm relatively new to Big Data and Hadoop, and I just started a little experimenting with the Hortonworks sandbox (Pig and Hive). I was wondering in which cases I could use the above tools Hadoop, Hive, Pig, HBase and Cassandra?
In my environment, sandboxes with a file of only 9 MB Hive and Pig had a response time of seconds to minutes. This is obviously not applicable in some situations, for example, for web applications (if it is not something else, for example, setting up my virtual machine).
My assumptions about the correct use:
- Hadoop: just the technological base for the rest, only very few use cases where it will be used directly.
- Beehive or Pig: for analytical processes that run once per hour or day.
- HBase or Cassandra: for real-time applications (e.g. web applications) where response times of 100 ms or less are required
Also, when to use HBase as opposed to when to use Cassandra?
Thanks!
cassandra hadoop hive apache-pig
Daniel
source share