NoSQL with analytic functions

Question

NoSQL with analytic functions

I am looking for any NoSQL system (preferably open source) that supports analytic functions ( AF for short), as Oracle / SQL Server / Postgres does. I did not find any built-in functions. I read something about Hive , but it does not have the actual AF function (windows, values of first_last, ntiles, lag, lead, etc.) Only histograms and ngrams. Also, some NoSQL systems (like Redis ) support map / reduce, but I'm not sure if AF can be replaced with it.

I want to do a performance comparison to select Postgres or a NoSQL system.

So, in short:

Finding NoSQL Systems with AF
Can I rely on the card / reduce to replace AF ? Fast, reliable and easy.

ps. I tried to make my question more constructive.

+7

nosql mapreduce analytic-functions

ravnur Oct 31 '12 at 11:03

source share

2 answers

Once you really understand how MapReduce works, you can do amazing things with a few lines of code.

Here is a good video course:

http://code.google.com/intl/fr/edu/submissions/mapreduce-minilecture/listing.html

The real difficulty factor will be between functions that you can implement with just one MapReduce and those that need a copied MapReduces. Moreover, some beautiful implementations of MapReduce (for example, CouchDB) do not allow you to group MapReduces (easily).

+2

Aurélien Nov 08 '12 at 10:18

source share

lstern · Accepted Answer · 2012-11-08T17:04:25+0000

In some function, knowledge of all existing data is used when it includes some king of aggregation (mean, average, standard deviation) or some order (first, last).

If you want a distributed NOSQL solution that supports AF, you need to use some centralized indexing and metadata to store information about the data in all nodes, thereby having a master node and, possibly, one point of failure.

You should ask what you expect using NoSQL. Do you want circuit diagrams? Distributed data? Improved performance for very simple queries?

Depending on your needs, I see here three main alternatives:

1 - use distributed NoSQL without a single point of failure (i.e. Cassandra ) to store your data and use a map / reduce data processing and output the results for the desired function (almost any basic NoSQL Hadoop support). The caveat is that map / reduce requests are not real-time (they may take minutes or hours to complete the request) and require additional configuration and training.

2 - use a traditional DBMS that supports several servers, such as MySQL Cluster

3 - use NoSQL with a master / slave topology that supports ad-hoc queries and aggregations like Mongo

Regarding the second question: yes, you can rely on M / R to replace AF. You can do almost anything with M / R.

NoSQL with analytic functions

More articles: