Strategies for updating bad database schema schemas

I appeared in a new workplace and discovered a database that desperately needed some help. There are a lot of things wrong, including

  • No foreign keys ... anywhere. They are faked using int and relationship management in code.
  • Almost every field can be NULL , which is actually not the case.
  • Naming conventions for tables and columns practically do not exist
  • Varchar that store concatenated rows of relational information

People may argue, "He works," as he is. But moving forward, it is a complete pain to manage all of this with code and opens up IMO errors to us. In principle, the database is used as a flat file, since it does not do a lot of work.

I want to fix it. The problems that I see now are as follows:

  • We have a lot of data (migration is probably difficult)
  • All database logic is in the code (large code changes occur during migration)

I am also tempted to do something "radical" like moving to a schema-free database.

What are some good strategies when confronted with an existing database built on a poorly designed schema?

+4
source share
6 answers

Forced foreign keys. If the domain has relationships, then it must have a foreign key.

Renaming existing tables / columns is fraught with danger, especially if there are many systems that directly access the database. Gotchas include tasks that are performed only periodically; they are often overlooked.

Interest: Scott Ambler's article: Introduction to Database Refactoring

and database refactoring directory

+4
source

Views are typically used to transition between changing data models due to encapsulation. The view looks like a table, but does not exist as a final object in the database - you can change which column will be returned for a given column alias, if required. This allows you to customize your codebase to use the view, so you can switch from the old table structure to the new one without requiring an application update. But this means that the view should return data in the existing format. For example, your current data model has:

 SELECT t.column --a list of concatenated strings, assuming comma separated FROM TABLE t 

... so the first version of the view will be the query above, but as soon as you create a new table that uses 3NF, the query for the view will use:

 SELECT GROUP_CONCAT(t.column SEPARATOR ',') FROM NEW_TABLE t 

... and the application code will never know that something has changed.

The problem with MySQL is that support for the view is limited - you cannot use variables in it and have no subqueries.

The reality of the changes you want to make is to efficiently rewrite the application from scratch. Moving logic from a codebase to a data model will radically change how an application receives data. Model-View-Controller (MVC) is ideal for implementations with such changes to minimize the costs of future changes like these.

+2
source

I would say leave it alone until you understand it. Then make sure that you do not start with one of the Things that you should not do .

+1
source
  • Create a completely new scheme and make sure that it is completely normalized and contains any unique, control, and not null restrictions, etc. that are required, and that the appropriate data types are used.
  • Pre-populate each table that populates the parent role with respect to a foreign key with one “Unknown” record.
  • Create an ETL (Extract Transform Load) process (I can recommend SSIS (SQL Server Integration Services), but there are many others) that you can use to regularly replenish a new schema from an existing one. Use the record "Unknown" as the parent of any lost records - there will be many;). You will need to think about how you will consolidate the duplicate entries - this should probably be based on each case.
  • Use as many iterations as necessary to refine your new scheme (make sure that the ETL process is supported and runs regularly).
  • As close as possible to create representations according to the new scheme that match the existing scheme
  • Incrementally modify any clients to use the new scheme, temporary use of views where necessary. You should be able to gradually turn off parts of the ETL process and end up completely turning it off.
+1
source

Read Scott Ambler's book on Database Refactoring . It covers many methods for improving the database, including the transitional measures necessary for old and new programs to work with a changing design.

+1
source

First, let's see how badly the code is connected to the database, if it all mixed up without a DAO layer, you should not think about rewriting, but if there is a DAO layer, then it's time to rewrite this layer and DB along with it. If possible, make a migration tool using two DAOs.

But I assume that the DAO does not exist, so you need to find which areas of the code you are going to change, and which parts of the database, which we hope can be broken down into smaller parts that can be updated as you support. The biggest deal is to get FK there and start checking the correct indexes, there are good chances that they will not be executed correctly.

I wouldn't worry too much about naming until the rest of the db is under control. As for NULL, if the program clamps the NULL value, do not let it be NULL, but if the program can handle it, I would not worry about it at this point in the future, if it makes the default value to the database, but it’s the way down the line from the sound of things.

Do something about Varchars earlier, and then later. If something makes the first clean background fixed for the program.

Another thing is to evaluate the efforts of each area, and then add this price to the cost of new development in this section of the code. This way you can fix the details when adding new features.

0
source

All Articles