Can I run a time series database (TSDB) on top of Apache Spark?

I'm starting to learn about big data and Apache Spark , and I have doubts.

In the future I will need to collect data from IoT, and this data will come to me as time series data . I read about Time Series Databases ( TSDB ) and I found some open source options like Atlas, KairosDB, OpenTSDB, etc.

I really need Apache Spark, so I want to know: can I use the Time Series database on Apache Spark? Does it make sense? Please remember that I am very new to big data concepts, Apache Spark and all the issues that I talked about in this question.

If I can run TSDB on top of Spark, how can I achieve this?

+5
source share
1 answer

I am a supporter of OpenTSDB, I know this is an old question, but I wanted to answer. My suggestion would be to write your incoming data to OpenTSDB, assuming you just want to save the raw data and process it later. Then use Spark to execute OpenTSDB queries using the OpenTSDB classes.

You can also write data with classes, I think you want to use the IncomingDataPoint construct, in fact I do not have data at the moment. Feel free to contact me on the OpenTSDB mailing list for further questions.

You see how OpenTSDB handles the incoming "put" request here, you should be able to do the same in your write code:

https://github.com/OpenTSDB/opentsdb/blob/master/src/tsd/PutDataPointRpc.java#L42

You can see how the Splicer project sends OpenTSDB requests, a similar method can be used in your Spark project, I think:

https://github.com/turn/splicer/blob/master/src/main/java/com/turn/splicer/tsdbutils/SplicerQueryRunner.java#L87

+1
source

All Articles