Elasticsearch vs hbase / hadoop for real-time statistics

I register millions of small journal documents weekly:

  • special requests for data mining
  • combining, comparing, filtering and calculating values
  • a lot of multi-text search using python
  • start these operations with all millions of documents, sometimes every day

My first thought was put on all documents in HBase / HDFS and runs Hadoop jobs, creating statistics results.

The problem is this: some results should be close to real time.

So, after some research, I discovered ElasticSearch and now I’m thinking about transferring all millions of documents and using DSL queries to generate statistics.

Is that a good idea? ElasticSearch seems to be so easy to handle with millions / billions of documents.

+8
hbase hadoop hdfs elasticsearch bigdata
source share
1 answer
+9
source share

All Articles