Snapshots instantly in the data warehouse

Question

Snapshots instantly in the data warehouse

I am trying to recreate client status at a specific point in time. For example, each client has many attributes that can change at any time (for example, risk assessment, billing today, customer satisfaction).

Every time a client applies for a loan, I would like to see the significance of all these characteristics at the time of sending. Subsequently, I want to use these values to develop a predictive model.

My first thought was to create a slowly changing type 2 dimension with effective expiration dates and dates and use the semi-open connection time_effective <= date_of_application <time_expired.

However, most of these attributes are behavioral dimensions that require complex calculations using historical data from fact tables. Moreover, the calculated values also cannot be grouped using ranges (from 0 to 500, $ 500-750, etc.). Tracking all of these attributes for each dimension leads to its explosion. Note. Some values change daily, others change at arbitrary points in time.

My ideal data extract would look like this:

ID # for a loan application
Dispatch time
Attribute 1 value at time of submission
Attribute value 2 ...
Attribute Value N

In addition to credit applications, there are other fact tables in which I want to find the characteristics that were in effect during this event.

What are the guidelines for handling this? I see several approaches:

Allow measurement to explode
Create separate tables with one or more attributes and separately request those tables that have attributes of interest to me.
Add a column to the credit application fact table containing a snapshot of all the attributes that interest me.

Some of these issues are discussed in Kimball's ETL Toolkit (p. 190-192) and in his Data Warehouse Toolkit (187-191). Pp. 154-157 discusses the "fast-changing monster sizes" that seem very relevant. However, it is difficult for me to implement these recommendations.

+4

database database-design data-warehouse etl

d_a_c321 Apr 15 '13 at 16:13

source share

1 answer

Tomas greif · Answer 1 · 2013-04-16T11:54:36+0000

I would create a separate application fact table with keys to the corresponding tables and corresponding records. Suppose you have the following sizes and fact tables:

(D) application
(D) the applicant (or client)
(D)
(D) time
(D) month_scoring_fact (monthly behavioral count)
(F) month_satisfaction_fact (monthly satisfaction survey)
(F) fact of satisfaction assessment (ad hoc satisfaction assessment)

Then you can create the following fact table. Application fact:

application_key - indicates the size of the application
applicant_key - indicates the size of the applicant
product_key - indicates the size of the product
time_key - indicates the size of time
month_scoring_fact_key - indicates the latest results of the monthly calculation.
month_satisfaction_fact_key - indicates the latest satisfaction data.
satisfaction_evaluation_fact_key - indicates the latest special assessment data.

Using this approach, you can get time-consistent data and save the application size in SCD0 (fixes only).

Snapshots instantly in the data warehouse

More articles: