Elasticsearch vs hbase / hadoop for real-time statistics

Question

Elasticsearch vs hbase / hadoop for real-time statistics

I register millions of small journal documents weekly:

special requests for data mining
combining, comparing, filtering and calculating values
a lot of multi-text search using python
start these operations with all millions of documents, sometimes every day

My first thought was put on all documents in HBase / HDFS and runs Hadoop jobs, creating statistics results.

The problem is this: some results should be close to real time.

So, after some research, I discovered ElasticSearch and now I’m thinking about transferring all millions of documents and using DSL queries to generate statistics.

Is that a good idea? ElasticSearch seems to be so easy to handle with millions / billions of documents.

+8

hbase hadoop hdfs elasticsearch bigdata

user3175226 Feb 26 '14 at 13:48

source share

1 answer

Jasper · Accepted Answer · 2014-02-26T15:48:53+0000

For real-time searches, Google Analytics Elastic Search is a good choice.
Definitely easier to set up and process than Hadoop / HBase / HDFS.
Elastic search versus HBase Good comparison: http://db-engines.com/en/system/Elasticsearch%3BHBase

Elasticsearch vs hbase / hadoop for real-time statistics

More articles: