Dynamic database schema

What is the recommended storage architecture for a dynamic logical database schema?

To clarify: where is the storage system for the model required, the scheme of which can be expanded or modified by its users after its creation, what are the good technologies, database models or storage mechanisms that will allow this to be done?

A few illustrations:

  • Creating / Modifying Database Objects Using Dynamically Generated DML
  • Creating tables with a large number of sparse physical columns and using only those that are necessary for the "superimposed" logic
  • Creating a “long, narrow” table that stores dynamic column values ​​as rows, which you then need to rotate to create a “short wide” rowset containing all the values ​​for a specific object
  • Using the BigTable / SimpleDB PropertyBag Type System

Any answers based on real world experience would be greatly appreciated.

+55
sql architecture database-design dynamic-data
Sep 15 '08 at 20:04
source share
16 answers

What you offer is not new. A lot of people have tried this ... most of them have found that they pursue "infinite" flexibility, and instead get much more, much less. This is a “roach motel” database project - the data comes in, but it's almost impossible to get it. Try conceptually writing code for ANY kind of constraint, and you will understand what I mean.

The end result is usually a system that is much more difficult to debug, maintain, and full of data consistency issues. This is not always the case, but most often it happens that way. Mainly because the programmer (s) does not see that this crash course arrives and does not protect it with an adversary. In addition, the case often ends when “infinite” flexibility is not really needed; it’s a very bad “smell” when the development team gets a specification that says “God, I don’t know what data they are going to put here, so let's put it WHATEVER” ... and the end users are just fine with predefined attribute types that they can use (encode a common phone # and let them create any of them - this is trivial in a well-normalized system and supports flexibility and integrity!)

If you have a very good development team and are deeply aware of the problems that you will have to overcome with this design, you can successfully create a well-designed, and not terribly buggy system. Most part of time.

Why start with the odds stacked up so much against you?

Do not believe me? Google "One table True Lookup" or "Design with one table." Some good results: http://asktom.oracle.com/pls/asktom/f?p=100:11:59::::P11_QUESTION_ID:10678084117056

http://thedailywtf.com/Comments/Tom_Kyte_on_The_Ultimate_Extensibility.aspx?pg=3

http://www.dbazine.com/ofinterest/oi-articles/celko22

http://thedailywtf.com/Comments/The_Inner-Platform_Effect.aspx?pg=2

+30
Sep 16 '08 at 0:11
source share

The strongly typed xml field in MSSQL worked for us.

+19
Sep 15 '08 at 20:05
source share

Like some others, do not do this unless you have another choice. One case where this is required is to sell a finished product that should allow users to record user data. My company product falls into this category.

If you need to let your customers do this, here are some tips:
- Create a reliable administrative tool for making schema changes and do not let these changes be made in any other way.
- Make an administrative function; Do not allow ordinary users access to it.
- Record every detail about each change of circuit. This will help you debug problems, and also give you CYA data if the client does something stupid.

If you can successfully deal with this (especially the first one), then any of the architectures you mentioned will work. My preference is to dynamically change database objects, as this allows you to take advantage of your DBMS query functions when accessing data stored in custom fields. The other three options require loading large chunks of data, and then do most of the data processing in the code.

+13
Nov 19 '09 at 17:12
source share

I have a similar requirement and I decided to use the MongoDB schema.

MongoDB (from "humongous") is an open, scalable, high-performance, schema-free, document-oriented database written in the C ++ programming language. (Wikipedia)

Main characteristics:

  • has rich query functionality (possibly closest to SQL DB)
  • readiness for production (foursquare, sourceforge use it)

Lowdarks (material you need to understand so you can use mongo correctly):

+9
Sep 26 '10 at 18:33
source share

I did this in a real project:

The database consisted of one table with one field, which was an array of 50. The word index was set on it. All data was empty, so the "word index" worked as expected. Numeric fields were represented as characters, and the actual sorting was done on the client side. (It is still possible to have multiple array fields for each data type, if necessary).

The logical data schema for logical tables was stored in the same database with another type table (the first element of the array). It also supported simple copy-on-write versioning using the same type field.

Benefits:

  • You can dynamically change and add / remove columns, no need to dump / reload the database. Any new column data can be set to the initial value (in fact) in zero time.
  • Fragmentation is minimal, since all records and tables are the same size, sometimes this gives better performance.
  • All table schemas are virtual. Any logic structure is possible (even recursive or object-oriented).
  • This is good for write-once, read-most, no-delete / mark-as-deleted data (most web applications actually are).

Disadvantages:

  • Indexing only in full words, without abbreviation,
  • Complex queries are possible, but with a slight performance degradation.
  • Depending on whether your preferred database system supports word arrays and indexes (it was implemented in the DBMS PROGRESS).
  • The relational model is only programmable (i.e., only at run time).

And now I think that the next step could be to implement such a database at the file system level. It can be relatively easy.

+7
Sep 15 '08 at 21:30
source share

The whole point of having a relational database is to keep your data safe and consistent. At that moment, when you allow users to change the scheme, data integrity occurs ...

If you need to store heterogeneous data, such as a CMS script, I would suggest storing XML validated by XSD in a string. Of course, you lose productivity and easy search capabilities, but this is a good compromise with IMHO.

Since it's 2016, forget the XML! Use JSON to store a non-relational data packet with the appropriate column type as the backend. Normally, you don’t need to request a value inside the bag, which will be slow, although many modern SQL databases understand JSON natively.

+5
Sep 15 '08 at 20:09
source share

It sounds to me like what you really want, it's a kind of “meta-schema,” a database schema that can describe a flexible schema for storing actual data. Changes to the dynamic layout are touchy, not what you want to communicate with, especially if users cannot make changes.

You will not find a database that is more suitable for this task than any other, so it is best to choose it based on other criteria. For example, what platform do you use to host the database? What language is the application written in? etc.

To clarify what I mean by "meta-schema":

CREATE TABLE data ( id INTEGER NOT NULL AUTO_INCREMENT, key VARCHAR(255), data TEXT, PRIMARY KEY (id) ); 

This is a very simple example: you most likely have something more specific for your needs (and hopefully a little easier to work with), but this serves to illustrate my point. You should consider that the database schema itself is immutable at the application level; any structural changes should be reflected in the data (that is, it is an instance of this scheme).

+3
Sep 15 '08 at 20:11
source share

Create 2 Databases

  • DB1 contains static tables and represents the "real" state of the data.
  • DB2 is free for users because they (or you) will need to write code to populate their odd tables from DB1.
+3
Sep 16 '08 at 15:25
source share

I know that the models indicated in the question are used in all production systems. Quite large is used in a large university / educational institution in which I work. They specifically use a long narrow table approach to compare data collected by many different data collection systems.

In addition, Google recently released its internal communication protocol, protocol buffer, as an open source code through its website. The database system modeled on this approach will be very interesting.

Check the following:

Object attribute value model

Google protocol buffer

+2
Sep 15 '08 at 20:14
source share

EAV approach, I think the best approach, but comes with a lot of cost

+2
Sep 16 '11 at 1:55 april
source share

Wikipedia has an excellent overview of the problem space:

http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model

+2
Sep 28 '12 at 2:41
source share

I know this is an old topic, but I think that it never loses relevance. I'm developing something now. Here is my approach. I use server setup with MySQL, Apache, PHP and Zend Framework 2 as an application platform, but it should work with any other settings as well.

Here is a simple implementation guide, you can further develop it.

You will need to implement your own query language interpreter because efficient SQL will be too complex.

Example:

 select id, password from user where email_address = "xyz@xyz.com" 

Layout of the physical database:

Table "specs": (should be cached at your data access level)

  • id: int
  • parent_id: int
  • Name: varchar (255)

The table 'items':

  • id: int
  • parent_id: int
  • spec_id: int
  • data: varchar (20,000)

Table of Contents Specifications:

  • 1, 0, 'user'
  • 2, 1, 'email_address'
  • 3, 1, 'password'

Table of Contents:

  • 1, 0, 1, ''
  • 2, 1, 2, 'xyz@xyz.com'
  • 3, 1, 3, 'my password'

Translation of the example into our own query language:

 select id, password from user where email_address = "xyz@xyz.com" 

for standard SQL it will look like this:

 select parent_id, -- user id data -- password from items where spec_id = 3 -- make sure this is a 'password' item and parent_id in ( -- get the 'user' item to which this 'password' item belongs select id from items where spec_id = 1 -- make sure this is a 'user' item and id in ( -- fetch all item id with the desired 'email_address' child item select parent_id -- id of the parent item of the 'email_address' item from items where spec_id = 2 -- make sure this is a 'email_address' item and data = "xyz@xyz.com" -- with the desired data value ) ) 

You will need to have a specification table cached in an associative array or hash table or something similar to get the spec_id from the specification names. Otherwise, you will need to add some more SQL overhead to get the spec_id from the names, for example, in this snippet:

Bad example, don’t use this, avoid it, cache the specification table instead!

 select parent_id, data from items where spec_id = (select id from specs where name = "password") and parent_id in ( select id from items where spec_id = (select id from specs where name = "user") and id in ( select parent_id from items where spec_id = (select id from specs where name = "email_address") and data = "xyz@xyz.com" ) ) 

I hope you understand this idea and you can decide for yourself whether this approach suits you.

Enjoy !:-)

+2
Mar 05 '14 at 19:53
source share

In the past, I chose option C - Creating a “long, narrow” table that stores the dynamic values ​​of the columns as rows, which you then need to rotate to create a “short wide” set of rows containing all the values ​​for a specific object. . However, I used ORM, and it REALLY made things painful. I can’t figure out how you would do it, say LinqToSql. I assume that I need to create a Hashtable to reference fields.

@Skliwz: I assume that he is more interested in allowing users to create custom fields.

0
Sep 15 '08 at 20:12
source share

ElasticSearch. You should keep this in mind, especially if you are dealing with datasets that you can separate by date, you can use JSON for your data, and you don’t commit using SQL to retrieve data.

ES displays your schema for any new JSON fields that you send, either automatically, using hints, or manually, which you can define / modify with a single HTTP command ("mappings"). Although it does not support SQL, it has excellent search capabilities and even aggregations.

0
Jul 01 '17 at 20:45
source share

The c2.com wiki explored the idea of ​​Dynamic Relational. You DO NOT need a DBA: the columns and tables are Create-On-Write, unless you start adding restrictions to make it act more like a traditional DBMS: as the project matures, you can gradually "lock it".

Conceptually, you can think of each line as an XML expression. For example, an employee record can be represented as:

 <employee lastname="Li" firstname="Joe" salary="120000" id="318"/> 

This does not mean that it should be implemented as XML, it is just a convenient conceptualization. If you request a non-existent column, for example "SELECT madeUpColumn ...", it is treated as empty or empty (unless additional restrictions prohibit such). And it is possible to use SQL , although you need to be careful in comparison because of the model of the implied type. But besides type processing, users of a dynamic relational system will feel right at home because they can use most of their existing RDBMS knowledge. Now, if someone just builds it ...

0
Sep 13 '17 at 4:39 on
source share

sql already provides a way to modify your schema: the ALTER command.

there’s just a table listing the fields that users cannot change and write a good interface for ALTER.

-5
Sep 15 '08 at 20:16
source share



All Articles