I would recommend Spark, if you take mining in the list of companies using it , you will have names like Amazon, eBay and Yahoo! . In addition, as you noted in the commentary, it is becoming a mature tool.
You have already given arguments against Cassandra and Solr, so I will focus on explaining why Hadoop MapReduce will not do the same as Spark for real-time queries.
Hadoop and MapReduce were designed to use the hard drive under the assumption that IO is negligible for big data. As a result, data is read and written at least twice - at the map stage and at the reduction stage. This allows you to recover from failures, since the partial result is protected, but it does not want you to want by sending requests in real time.
The spark is intended not only to eliminate the shortcomings of MapReduce, but also for the interactive analysis of the data that you want. This goal is achieved mainly by using RAM , and the results are amazing. Spark jobs will often be 10-100 times faster than MapReduce equivalents.
The only caveat is the amount of memory you have. Most likely, your data is likely to be useful in RAM , you can provide , or you can rely on sampling . Usually when interacting with data there is no real need to use MapReduce, and this seems to be the case in your case.
source share