What to use for aggregation and requests in real time?

Question

What to use for aggregation and requests in real time?

I am looking for a tool / database / solution that can help me with real-time aggregation of logs and can query them also in real time.
The main requirement is the ability to deliver results as soon as possible, bearing in mind that there may be many events for the query (possibly billions), but there will be many “columns” in the logs, and each query will set some conditions for these columns, so the final result will be some kind of aggregation, or only a small subset of rows will be returned.

Now I was watching HDFS + HBase, which seems like a good solution. Are there any alternatives? Can you recommend something?

+4

logging hbase real-time

wlk Apr 16 '11 at 21:03

source share

5 answers

If you try to parse / assemble the logs in real time and do something, then my recommendation is this:

# tail --follow=name --retry /var/log/logfile.log | sendxmpp -i -u username -p password -j somejabberserver.com sendloglineto@somejabberserver.com

This will send each line to the log as it appears as an XMPP message for the jabber user sendloglineto@somejabberserver.com. This jabber user will be connected through the client / software written by you (I prefer perl and Net :: Jabber). You can program the client to do whatever you want with each XMPP message (for example, store it in a database). If you store it in CouchDB, you can use the _changes API to track updates to the specific database served by CouchDB.

+2

Gjorgji tashkovski Jun 26 '11 at 14:35

source share

You can watch calamaris . In the commercial world of Splunk .

+1

mindas Apr 16 '11 at 22:26

source share

Although, an old question, I am posting an answer with a technical stack that are available now ...

Swallowing data: Apache Fluke or Spark or Spring XD or Kafka stream
Data storage and processing: HBASE (rawdata in the staging table and aggregated data in the final tables based on the requirements based on the search ranges can create row strings) + SparkonHbase
Real-time Search: Hbase with solr indices
Reporting (optional): tableu or Banana (open source)
In general: Lambda architecture

+1

Ram ghadiyaram Aug 22 '15 at 7:48

source share

Try Apache Kafka. This should be useful for your case.

0

Anuj mehta Nov 19 '13 at 10:26

source share

Olaf · Accepted Answer · 2011-05-16T19:30:16+0000

You can check out Flume: https://github.com/cloudera/flume/wiki .

What to use for aggregation and requests in real time?

More articles: