Free Warehouse - Infobright, Hadoop / Hive or What?

I need to store a large number of small data objects (millions of rows per month). Once they are saved, they will not change. I need:

  • keep them safe
  • use them for analysis (mostly time-oriented)
  • sometimes extract some raw data
  • It would be nice if it could be used with JasperReports or BIRT

My first snapshot was the Infobright community - just a read-only storage engine for MySQL

On the other hand, people say the NoSQL approach could be better. Hadoop + Hive looks promising, but the documentation looks bad and the version number is less than 1.0.

I heard about Hypertable, Pentaho, MongoDB ....

Do you have any recommendations?

(Yes, I found several topics here, but that was a year or two ago)

Edit: Other solutions: MonetDB, InfiniDB, LucidDB - what do you think?

+5
source share
3 answers

Here I have the same problem and have been doing research; Two types of repositories for BI:

  • column designed. Free and known: monetDB, LucidDb, Infobright. Infinidb
  • Distributed: hTable, Cassandra (also theoretically column oriented)
  • Documented / MongoDb, CouchDB

The answer depends on what you really need:

http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/

  • , , , . : ( : noSQL . DB, BI). -, ( ) ( )/ Cassandra.

BI, CRM/CMS,

, . , InfiniDB CODB, . , , .

+3

GridSQL. "" .

GridSQL PostgreSQL, . , , .

+2

, MySQL. , , Infobright . , MySQL Archive. , IIRC - , , Infobright . , , .

(, NoSQL), , , . , CouchDB "", , , .

My only problem with your dataset is that since you have specified the time, you can make sure that any solution you use will allow you to archive data for a specific time. This is a common data warehouse practice, which allows you to store only N months of data on the Internet and archive the rest. This is because the partitioning implemented in the DBMS is very useful.

0
source

All Articles