Snapshots instantly in the data warehouse

I am trying to recreate client status at a specific point in time. For example, each client has many attributes that can change at any time (for example, risk assessment, billing today, customer satisfaction).

Every time a client applies for a loan, I would like to see the significance of all these characteristics at the time of sending. Subsequently, I want to use these values โ€‹โ€‹to develop a predictive model.

My first thought was to create a slowly changing type 2 dimension with effective expiration dates and dates and use the semi-open connection time_effective <= date_of_application <time_expired.

However, most of these attributes are behavioral dimensions that require complex calculations using historical data from fact tables. Moreover, the calculated values โ€‹โ€‹also cannot be grouped using ranges (from 0 to 500, $ 500-750, etc.). Tracking all of these attributes for each dimension leads to its explosion. Note. Some values โ€‹โ€‹change daily, others change at arbitrary points in time.

My ideal data extract would look like this:

  • ID # for a loan application
  • Dispatch time
  • Attribute 1 value at time of submission
  • Attribute value 2 ...
  • Attribute Value N

In addition to credit applications, there are other fact tables in which I want to find the characteristics that were in effect during this event.

What are the guidelines for handling this? I see several approaches:

  • Allow measurement to explode
  • Create separate tables with one or more attributes and separately request those tables that have attributes of interest to me.
  • Add a column to the credit application fact table containing a snapshot of all the attributes that interest me.

Some of these issues are discussed in Kimball's ETL Toolkit (p. 190-192) and in his Data Warehouse Toolkit (187-191). Pp. 154-157 discusses the "fast-changing monster sizes" that seem very relevant. However, it is difficult for me to implement these recommendations.

+4
source share
1 answer

I would create a separate application fact table with keys to the corresponding tables and corresponding records. Suppose you have the following sizes and fact tables:

  • (D) application
  • (D) the applicant (or client)
  • (D)
  • (D) time
  • (D) month_scoring_fact (monthly behavioral count)
  • (F) month_satisfaction_fact (monthly satisfaction survey)
  • (F) fact of satisfaction assessment (ad hoc satisfaction assessment)

Then you can create the following fact table. Application fact:

  • application_key - indicates the size of the application
  • applicant_key - indicates the size of the applicant
  • product_key - indicates the size of the product
  • time_key - indicates the size of time
  • month_scoring_fact_key - indicates the latest results of the monthly calculation.
  • month_satisfaction_fact_key - indicates the latest satisfaction data.
  • satisfaction_evaluation_fact_key - indicates the latest special assessment data.

Using this approach, you can get time-consistent data and save the application size in SCD0 (fixes only).

+2
source

All Articles