Is this the “right” database design?

Question

Is this the “right” database design?

I am working with a new version of a third-party application. In this version, the database structure is changing, they say "improve performance".

The old version of the database had the following general structure:

TABLE ENTITY ( ENTITY_ID, STANDARD_PROPERTY_1, STANDARD_PROPERTY_2, STANDARD_PROPERTY_3, ... ) TABLE ENTITY_PROPERTIES ( ENTITY_ID, PROPERTY_KEY, PROPERTY_VALUE )

therefore, we had a main table with fields for the main properties and a separate table for managing custom properties added by the user.

The new version of DB insted has the following structure:

 TABLE ENTITY ( ENTITY_ID, STANDARD_PROPERTY_1, STANDARD_PROPERTY_2, STANDARD_PROPERTY_3, ... ) TABLE ENTITY_PROPERTIES_n ( ENTITY_ID_n, CUSTOM_PROPERTY_1, CUSTOM_PROPERTY_2, CUSTOM_PROPERTY_3, ... )

So now, when the user adds a custom property, the new column is added to the current ENTITY_PROPERTY table until the maximum number of columns (managed by the application) is reached, then a new table is created.

So my question is this: is it designed correctly to create a database structure? Is this the only way to increase productivity ? The old structure required a lot of connections or subnets, but this structure does not seem to me very smart (or even correct) ...

+8

performance database sql-server database-design

davioooh May 03 '12 at 8:07

source share

5 answers

No no. This is terrible.

until the maximum number of columns (processed by the application) is reached, a new table is created.

This sentence says it all. Under no circumstances will the application dynamically create tables. The "old" approach is not ideal either, but since you have a requirement to allow users to add custom properties, this should be so.

Consider this:

You are losing all type safety because you must store all values in the PROPERTY_VALUE column
Depending on your users, you could change the scheme in advance, and then let them run some kind of batch task to update the database, so at least all the properties will be declared in the right data type. In addition, you may lose the entity_id / key object.
Take a look: http://en.wikipedia.org/wiki/Inner-platform_effect . It certainly smells of it.
Perhaps RDBMS is not suitable for your application. Think about how to use key / value-based storage like MongoDB or another NoSQL database. ( http://nosql-database.org/ )

+5

Dariop May 03 '12 at 8:16

source share

From what I know about databases (but I'm certainly not the most experienced one), it seems like a pretty bad idea in your database. If you already know how many maximum user properties a user can have, I would say that you better set the column column to this value.

Again, I am not an expert, but creating new columns on the fly is not such operational databases as. It will bring you more trouble than anything.

If I were you, I would either fix the number of custom properties, or stick to the old system.

+1

Gabriel theron May 03 '12 at 8:15

source share

I believe that creating a new table for each object to store properties is a bad design, since you could increase the size of the database with tables. The only request for applying the second method would be that you do not go through all the redundant lines that do not apply to the selected object. However, using indexes in your database in the source table ENTITY_PROPERTIES can greatly help in performance.

I would personally stick to your initial design, apply indexes, and let the database engine determine the best methods for selecting data, rather than dividing each property of the object into a new table.

0

Darren davies May 03 '12 at 8:14

source share

There is no “right” way to create a database. I do not know a universally recognized set of standards other than the famous "normal form" theory; many database projects ignore this standard for performance reasons.

There are ways to evaluate database designs — performance, maintainability, legibility, etc. Quite often you have to trade them against each other; that, in your opinion, your betrayal - trade supports legibility against productivity.

So, the best way to find out if this was a good compromise is to check if productivity gains materialized. The best way to find this is to create the proposed schema, load it using a representative dataset, and write the queries that you will need to run in the production process.

I assume that the new design will not be noticeably faster for queries such as "find STANDARD_PROPERTY_1" from the object where STANDARD_PROPERTY_1 = "banana".

I assume that this will not be noticeably faster when getting all the properties for a given object; in fact, it can be a little slower, because instead of just joining ENTITY_PROPERTIES, the new design requires joining multiple tables. You will return "sparse" results - apparently, not all objects will have values in the property_n columns in all ENTITY_PROPERTIES_n tables.

In cases where a new design can be significantly faster, you need to create a where where clause. For example, if you find an object in which user property 1 is true, user property 2 is a banana and user property 3 is not ("kylie", "pussycat dolls", "giraffe"), most likely, specify columns (if possible) in tables ENTITY_PROPERTIES_n instead of rows in table ENTITY_PROPERTIES. Probably.

Regarding maintainability - yuck. Now your database access code should be much smarter, knowing which table contains which property and how many columns there are too many. The likelihood of funny errors is high - there are more moving parts, and I can not come up with any obvious unit tests to make sure that the database access logic works.

Intelligence is another problem - this solution is not in most developer tools, it is not a standard model. The old solution is fairly widely known, commonly called an "entity-attribute-value". This becomes a serious problem for long-term projects where you cannot guarantee that the original development team will work.

0

Neville kuyt May 03 '12 at 10:06

source share

Adam houldsworth · Accepted Answer · 2012-05-03T08:15:14+0000

I saw how this was done before, with the supposed (often unproven) connection “count” - it basically turns a data table with heavy columns into a table with a heavy column. They came across their limitation, as you mean, by creating new tables when they end from columns.

I totally disagree with him.

Personally, I would stick to the old structure and overestimate performance issues. This does not mean that the old path is correct, it is slightly better than the “improvement”, in my opinion, and eliminates the need for large-scale reorganization of database tables and DAL code.

These tables hit me like static ... caching would be even better with a performance improvement without distorting the database, and I would look at this first. Make an “expensive” choice once and insert it into memory somewhere, and then forget about your problems (note, I highlight the need to manage the cache, but static data is one of the easiest to manage).

Or wait until you run the maximum number of tables in the database :-)

Others offer completely different stores. This is a viable opportunity, and if I did not have an existing database structure, I would consider it too. However, I see no reason why this structure cannot fit into the DBMS. I saw how this was done in almost all the major applications that I worked on. Interestingly, they all went along a similar route, and all of them were mostly “successful” implementations.

Is this the “right” database design?

More articles: