Free storage systems - in particular for data storage

I am creating material for our site (a decent sized site that receives several million page views per day), and I wonder if there are any good open source storage systems there.

In particular, I'm only looking for something to store data - I plan to create a user interface / interface so that it displays the information we care about. However, I do not want to create a custom database for this, and although I am sure that the SQL database will not work here, I am not sure what to use. Any pointers to useful articles would also be appreciated.

Edit: I have to mention - one DB I looked at briefly was MongoDB. It seems like this might work, but their Use Cases specifically refers to the data warehouse as Less Well Suited: http://www.mongodb.org/display/DOCS/Use+Cases . Also, it doesn't seem to be targeting a data warehouse.

+4
source share
7 answers

http://www.hypertable.org/ may be what you are looking for (and I'll cover it here above) to store large amounts of recorded data with normalization. that is, a visitor’s magazine.

Hypertable is based on the google bigTable project. see http://code.google.com/p/hypertable/wiki/PerformanceTestAOLQueryLog for tests

you lose the relational capabilities of SQL-based dbs, but you get more performance. you can easily use a hyperlink to store millions of lines per hour (on your hard drive).

hope that helps

+5
source

Maybe I don’t get the problem right - however, if you find some time to (re) visit Kimballs "Data Warehouse Toolkit", you will find that all that is required for the base DW is a simple SQL database, in other words, you could build a decent DW with MySQL using MyISAM for the storage engine. The question is only in the desired detailing of the information - what do you want to keep and for how long. If your reports are mostly periodic and you use a report repository or cache, you do not need to store pre-calculated clusters (no need for cubes). In other words, Kimball's cached reporting star can provide decent performance in many cases. You can also watch the Pentaho BI Suite (open source) community publication to get started quickly with ETL, analytics, and reporting β€” and experiment a bit with performance measurement before diving into custom development. Although this may not be what you expected, it may be useful to consider.

+3
source

Pentaho mondrian

  • Open source
  • Uses a standard relational database
  • MDX (think pivot table)
  • ETL (via kettle)

I am using this.

+3
source

In addition to Mike's answer to the hyperlink, you can take a look at the Apache Hadoop project:

http://hadoop.apache.org/

They provide a number of tools that may be useful for your application, including HBase, another implementation of the BigTable concept. I would suggest that for reporting you might need a mapreduce implementation.

+2
source

It all depends on the data and how you plan to access it. MonetDB is a column-oriented database engine from the most revolutionary database technology team. They just won the 10-year VLDB Premium Paper Award . The database is open source and there are many reviews on the Internet praising them.

Perhaps you should take a look at the TPC and see which of the test case datasets fit your case and work from there.

Also consider the need for concurrency, it adds a lot of overhead for any approach, and sometimes not required. For example, you can pre-retell some summary or index data and only have it protected for high concurrency. Profiling your data queries is the next step.

About SQL, I don’t like it either, but I don’t think it is reasonable to banish the engine just because of the external interface.

+2
source

I see a similar problem and am thinking of using a simple MyISAM with http://www.jitterbit.com/ as the level of data access. Jitterbit (or another free tool) seems very enjoyable for this kind of conversion.

Hope this helps a bit.

0
source

Many people just use Mysql or Postgres :)

0
source

All Articles