SQL Server 2008 - Too much denormalization and over indexing. What is the use of a matrix?

I have a novice developer who is very enthusiastic about what he calls the "matrix"

I'm looking for peer understanding

In short, this is what we have:
- 1 highly denormalized table with approximately 120 columns
- Data points vary from account, customer, family, relationship, product, employee, etc.

- One index per column: about 120 non-clustered indexes
- About 90% of the total space in the database used by indexes today is the indexes in this table.
- Today about 1.5 million lines with many zeros
- The table is loaded with a stored procedure, the core of which is dynamic SQL
- All field names are common and do not describe data
- the data dictionary type table is used with dynamic SQL to load any data point in any field

- The display of fields is not static: today the column dim_0001 is the name of the client, but tomorrow there may be something else
- No primary key
- Lack of foreign keys
- There are no real restrictions (for example, all fields are zero)

Argument for the table:
- simplifies recording of requests, as it eliminates the need to record some connection

Intended Use:
- The end-user level will be the main component of building the Universe in Business Objects
- Development of an ETL development process

My recommendation will either kill the process it is in today (early development in a test environment) or move it to the next stage of the test.

Based on my research, my education and experience, I do not support it and I want the tables to be deleted as soon as one or two processes that depend on these tables are transferred to another solution.

The script below is for your reference (I limited myself to one sample example).

Any understanding that you can offer (at least one word) is valuable

-- The Matrix CREATE TABLE [z005497].[tblMatrix]( [as_of_dt] [datetime] NOT NULL, [dim_0001] [varchar](100) NULL, [dim_0002] [varchar](103) NULL, [dim_0003] [varchar](100) NULL, [dim_0004] [varchar](100) NULL, [dim_0005] [varchar](100) NULL, [dim_0006] [varchar](100) NULL, [dim_0007] [varchar](100) NULL, [dim_0008] [varchar](100) NULL, [dim_0009] [varchar](100) NULL, [dim_0010] [varchar](100) NULL, [dim_0011] [varchar](100) NULL, [dim_0012] [varchar](100) NULL, [dim_0013] [varchar](100) NULL, [dim_0014] [varchar](100) NULL, [dim_0015] [varchar](100) NULL, [dim_0016] [varchar](100) NULL, [dim_0017] [varchar](103) NULL, [dim_0018] [varchar](103) NULL, [dim_0019] [varchar](103) NULL, [dim_0020] [varchar](103) NULL, [dim_0021] [varchar](103) NULL, [dim_0022] [varchar](103) NULL, [dim_0023] [varchar](103) NULL, [dim_0024] [varchar](103) NULL, [dim_0025] [varchar](103) NULL, [dim_0026] [varchar](11) NULL, [dim_0027] [varchar](11) NULL, [dim_0028] [varchar](11) NULL, [dim_0029] [varchar](11) NULL, [dim_0030] [varchar](11) NULL, [dim_0031] [varchar](11) NULL, [dim_0032] [varchar](11) NULL, [dim_0033] [varchar](11) NULL, [dim_0034] [varchar](11) NULL, [dim_0035] [varchar](11) NULL, [dim_0036] [varchar](11) NULL, [dim_0037] [varchar](11) NULL, [dim_0038] [varchar](11) NULL, [dim_0039] [varchar](11) NULL, [dim_0040] [varchar](11) NULL, [dim_0041] [varchar](11) NULL, [dim_0042] [varchar](11) NULL, [dim_0043] [varchar](11) NULL, [dim_0044] [varchar](11) NULL, [dim_0045] [varchar](11) NULL, [dim_0046] [varchar](11) NULL, [dim_0047] [varchar](11) NULL, [dim_0048] [varchar](11) NULL, [dim_0049] [varchar](11) NULL, [dim_0050] [varchar](11) NULL, [dim_0051] [varchar](11) NULL, [dim_0052] [varchar](11) NULL, [dim_0053] [varchar](11) NULL, [dim_0054] [varchar](5) NULL, [dim_0055] [varchar](5) NULL, [dim_0056] [varchar](5) NULL, [dim_0057] [varchar](5) NULL, [dim_0058] [varchar](5) NULL, [dim_0059] [varchar](5) NULL, [dim_0060] [varchar](5) NULL, [dim_0061] [varchar](5) NULL, [dim_0062] [varchar](5) NULL, [dim_0063] [varchar](5) NULL, [dim_0064] [varchar](5) NULL, [dim_0065] [varchar](5) NULL, [dim_0066] [varchar](5) NULL, [dim_0067] [varchar](5) NULL, [dim_0068] [varchar](5) NULL, [dim_0069] [varchar](5) NULL, [dim_0070] [varchar](5) NULL, [dim_0071] [varchar](5) NULL, [dim_0072] [varchar](5) NULL, [dim_0073] [varchar](5) NULL, [dim_0074] [varchar](5) NULL, [dim_0075] [varchar](5) NULL, [dim_0076] [varchar](5) NULL, [dim_0077] [varchar](5) NULL, [dim_0078] [varchar](5) NULL, [dim_0079] [varchar](5) NULL, [dim_0080] [varchar](5) NULL, [dim_0081] [varchar](5) NULL, [dim_0082] [varchar](5) NULL, [dim_0083] [varchar](5) NULL, [dim_0084] [int] NULL, [dim_0085] [int] NULL, [dim_0086] [int] NULL, [dim_0087] [int] NULL, [dim_0088] [int] NULL, [dim_0089] [int] NULL, [dim_0090] [int] NULL, [dim_0091] [int] NULL, [dim_0092] [int] NULL, [dim_0093] [int] NULL, [dim_0094] [varchar](12) NULL, [dim_0095] [varchar](12) NULL, [dim_0096] [varchar](12) NULL, [dim_0097] [varchar](120) NULL, [dim_0098] [varchar](120) NULL, [dim_0099] [varchar](120) NULL, [dim_0100] [numeric](20, 0) NULL, [dim_0101] [varchar](20) NULL, [dim_0102] [varchar](20) NULL, [dim_0103] [varchar](20) NULL, [dim_0104] [varchar](20) NULL, [dim_0105] [varchar](20) NULL, [dim_0106] [varchar](20) NULL, [dim_0107] [varchar](20) NULL, [dim_0108] [varchar](20) NULL, [dim_0109] [varchar](20) NULL, [dim_0110] [varchar](20) NULL, [dim_0111] [varchar](20) NULL, [dim_0112] [varchar](20) NULL, [dim_0113] [varchar](20) NULL, [dim_0114] [varchar](20) NULL, [dim_0115] [varchar](20) NULL, [dim_0116] [varchar](20) NULL, [dim_0117] [varchar](20) NULL, [dim_0118] [varchar](20) NULL, [dim_0119] [varchar](20) NULL, [dim_0120] [varchar](20) NULL, [lastLoad] [datetime] NULL ) ON [PRIMARY] -- Index example CREATE NONCLUSTERED INDEX [idx_dim_0001 (not unique)] ON [z005497].[tblMatrix] ( [dim_0001] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] -- The configuration table from which developers would find out what is in the Matrix CREATE TABLE [z005497].[tblMatrixCfg]( [dimId] [int] IDENTITY(100000,1) NOT NULL, [colName] [varchar](25) NOT NULL, [dataType] [varchar](25) NOT NULL, [dimName] [varchar](25) NOT NULL, [dimDesc] [varchar](500) NOT NULL, [dimpath] [varchar](5000) NOT NULL, [loadDate] [datetime] NOT NULL, [modUser] [varchar](100) NOT NULL, [modDate] [datetime] NOT NULL, CONSTRAINT [PK_tblMatrixCfg_1] PRIMARY KEY CLUSTERED ( [dimId] ASC, [colName] ASC, [dimName] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] 
+4
source share
4 answers

Kill him if you can.

In addition, this developer needs much more experience. And he must get it in another company.

Mostly breaking so many things, I don't know where to start.

Even if you end up fighting a highly normalized model that slavishly follows best practices, it will not be compared with the disaster this project will create.

+7
source

Just to give one example of what Cade had in mind with "I don't know where to start":

"Today, dim_0001 is the name of the client, but tomorrow there may be something else."

This usually also means that in the user acceptance system, dim_0001 can be the name of the client (and the system can work and be accepted), and then you go to production, and dim_0001 becomes the name of the president’s wife or therefore, you need to spend hours of meetings trying to find out ( a) where the problem is, and (b) how to fix it as little as possible.

((b) it usually comes down to fixing the code with things like "if col_name = dim_0001, then don't treat it like what the matrix says, but treat it like what's hardcoded here.")

+5
source

"What is the use of the Matrix?"

Well, of course I do not understand.

I had never seen anything like it before, and I don’t understand how it is intended to be used or how indexes are designed to speed up something or how you can query this table without using at least self-join.

Call me inexperienced if you want, but this is the first for me. I would think that if this is a way to do something, db developers should not put so much effort into allowing developers to define tables with columns that have different data types, with relationships.

+4
source

This is the result of an attempt to attach an object-oriented paradigm to a relational system. Databases of documents allow such programming:

Documents inside a documented database are similar, in some ways, records, or rows in relational databases, but they are less rigid. They do not have to adhere to the standard scheme and they will not have all the same sections, slots, parts, keys, etc. For an example here:

 FirstName="Bob", Address="5 Oak St.", Hobby="sailing". 

Another document could be:

 FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}]. 

Both documents have some similar information and some different. Unlike a relational database, where each record will have the same set of fields and unused fields can be empty, there are no empty "fields" in any document (record) in this case. This system allows new information to be added, and it does not require an explicit indication if other parts of the information are not taken into account.

Attempting to use this paradigm in a relational database is a "square peg, round hole" problem. A document database may be excellent for a highly transactional system, but the analysis will be better served by loading transactional data into various fact tables in the data warehouse.

+1
source

All Articles