What to use for aggregation and requests in real time?


I am looking for a tool / database / solution that can help me with real-time aggregation of logs and can query them also in real time.
The main requirement is the ability to deliver results as soon as possible, bearing in mind that there may be many events for the query (possibly billions), but there will be many “columns” in the logs, and each query will set some conditions for these columns, so the final result will be some kind of aggregation, or only a small subset of rows will be returned.

Now I was watching HDFS + HBase, which seems like a good solution. Are there any alternatives? Can you recommend something?

+4
source share
5 answers

You can check out Flume: https://github.com/cloudera/flume/wiki .

+3
source

If you try to parse / assemble the logs in real time and do something, then my recommendation is this:

# tail --follow=name --retry /var/log/logfile.log | sendxmpp -i -u username -p password -j somejabberserver.com sendloglineto@somejabberserver.com 

This will send each line to the log as it appears as an XMPP message for the jabber user sendloglineto@somejabberserver.com. This jabber user will be connected through the client / software written by you (I prefer perl and Net :: Jabber). You can program the client to do whatever you want with each XMPP message (for example, store it in a database). If you store it in CouchDB, you can use the _changes API to track updates to the specific database served by CouchDB.

+2
source

You can watch calamaris . In the commercial world of Splunk .

+1
source

Although, an old question, I am posting an answer with a technical stack that are available now ...

  • Swallowing data: Apache Fluke or Spark or Spring XD or Kafka stream

  • Data storage and processing: HBASE (rawdata in the staging table and aggregated data in the final tables based on the requirements based on the search ranges can create row strings) + SparkonHbase

  • Real-time Search: Hbase with solr indices

  • Reporting (optional): tableu or Banana (open source)

  • In general: Lambda architecture

+1
source

Try Apache Kafka. This should be useful for your case.

0
source

All Articles