In a starry scheme, are there foreign key restrictions between facts and dimensions?

I get a first introduction to the data warehouse, and I wonder if I need to have foreign key constraints between facts and dimensions. Are there any serious flaws in their absence? Im currently working with a relational schema of stars. In traditional applications, I used them, but I began to wonder if they are needed in this case. Im currently working in a SQL Server 2005 environment.

UPDATE:. For those concerned, I came across a poll with the same question.

+7
sql-server database-design data-warehouse
source share
7 answers

Most data warehouses (DWs) do not have foreign keys implemented as restrictions, because:

  • In the general case, the foreign key constraint is enabled: insertion into the fact table, any key updates and deletion from the measurement table.

  • During loading, indexes and restrictions are discarded to speed up the loading process, data integrity is ensured by the ETL application.

  • Once tables are loaded, DW is essentially read-only β€” the constraint does not start when reading.

  • After loading, any required indexes will be restored.

  • Deletion in DW is a controlled process. Before deleting rows from dimensions, fact tables are requested for row keys that need to be deleted - deletion is allowed only if these keys do not exist in any of the fact tables.

Just in case, it is usually necessary to periodically run queries to find orphaned records of actually tables.

+14
source share

We use them and we are pleased with it.

Is it good to have foreign keys in the data warehouse (relationships)?

There is overhead, but you can always turn off the restriction at boot time and then turn it back on.

Having a constraint in place can catch ETL errors and modeling defects.

+8
source share

I think, in theory, you need it. But it depends on how you separate your data from the database. If all of them are in the same database, a foreign key can help you, since setting a foreign key will help the database choose faster based on indexing. If you share tables across many databases, you need to check them at the application level.

You can check your database, but it can be slow. In general, in the data warehouse, we do not care about redundancy or integrity. We already have a lot of data, and several integrity and redundancy will not affect the overall aggregated data.

+3
source share

I do not know about the need, but I believe that they are good for data integrity purposes. You want your fact table to always point to a valid entry in the dimension table. Even if you are sure that this will happen, why not check the database for you?

+2
source share

The reasons for using integrity constraints in a data warehouse are exactly the same as in any other database: to ensure data integrity. Assuming that you and your users care about accurate data, you need to somehow make sure that it stays that way and that the business rules apply correctly.

+2
source share

As far as I know, FKs speeds up queries. In addition, many BI solutions use them in their integration layer. Therefore, for me they are necessary in DWs.

+2
source share

Hope this thread is still active. My thinking: for large fact tables with many dimensions and records, foreign keys slow down inserts and updates, so the fact table becomes too slow to load, especially as the size grows. Indexes are used for query AFTER the table is loaded, so they can be disabled during inserts / updates and then rebuilt. The RELATION foreign key is NOT important by the foreign key itself: this is really implied in the ETL process. I found that foreign keys make things too slow in the real world of Datawarehouse. You need to use the VIRTUAL foreign key: the relationship is theirs, but not a limitation. If you break the foreign key relationship in the Datawarehouse, you are doing something wrong. If you disable them during insertions and there is a mismatch or orphan, you cannot reuse them, so that’s the point. The whole point of DW is quick access and query. Foreign keys make this impossible. Interesting debate: it’s not easy to find this question on the Kev Network

+1
source share

All Articles