ES-Hadoop is closer to the discharge between the Hadoop Ecosystem and ES. This is not a separate release of ES.
This basically improves integration between the Hadoop ecosystem application and ES. In my organization, we use this function for two purposes:
Before indexing data in ES, we use Spark to analyze the data and perform appropriate aggregation to reduce the amount of indexing that needs to be performed on ES. ES-Hadoop helps us index directly from Spark's data structures in ES. We begin the indexing process with a single line of code and we don’t need to write the indexing program ourselves. (The function is customizable and you have the ability to index the data as you like).
In our organization, we use ES as our closest real-time analytic cluster. Data is placed in ES in such a way as to provide the best performance for our customers. Sometimes (usually when we have ideas about some new features), we need to get data from the ES and perform complex data processing. In these cases, we can create a Spark data structure from ES data in one line of code.
So, ES-Hadoop is closer to a well-written connector. You still have to transfer data from your ES cluster to Hadoop.
I'm not sure if compared to HBase. You cannot compare the functions of HBase, which is a repository of key values ​​compared to ES, which is a general-purpose search engine, and has implemented very good analytics capabilities in the latest version. As I see it, we are dealing with various tools that address various issues.
source share