When the source relationship is not based on the Business Keys

Figure 1

 

REMARK : This is a fictive model explaining a case and is not based on a real world example

Through this model we want to explain the more advanced logic necessary in the staging layer to solve the relationship and load the link table when the foreign/primary key relationship between the source objects is not based on the Business Key.

The relational source model as shown on figure 1 shows the relationship between an Invoice source table and a Customer table.

The Business Keys are indicated by the U-sign front of the column and if they do not exist than the the Business Key is indicated by the Unique key.

More specific in this model, it means :

  • Invoices:
  • The Business Key is INVOICE_NUMBER
  • Customers:
  • The Business Key is a composite key consisting of the columns:
  • FIRST_NAME
  • LAST_NAME
  • BIRTH_DATE
  • GENDER

The relationship between the 2 tables in this model is based on the customer_id which is the technical key (surrogate key) and is not based on the business key, meaning the combination of First_name, Last_name,Birth_date and Gender.

It means the following:

  • That the link table cannot be loaded only by using the Invoices table (where the relationship exists) but the a lookup is needed to find the Business Keys in the Customers table.
  • As the data in the landing area holds incremental data it means that you may not find all the records to solve the relationships during the lookup and that you have to do a lookup to the Customer Hub to find the missing related records and their Business Keys.

Figure 2

 

Figure 2 explains the 2 lookups that are needed to resolve the link between Invoice and Customer.
In the invoices landing table are 2 incremental records with which have 5 and 7 as the customer-id.
When the lookup happens to the Customers landing table it only holds the record with the Customer_Number = 5 which means that the record with Customer_Number 7 did not arrive into the landing area during this incremental load.

The only way to solve this is to do the lookup to the Satellite and that the Satellite holds the primary key of the Customers table = Customer_Number, so the lookup is possible.
Through the lookup to the HUB using the Hash key, the Business Keys can be found resulting in a fully solved Link table.

Underneath you find the SQL used to load the invoices Staging table and which resolves the Link table between Invoices and Customers in a Oracle Database prior to 12C.

INSERT INTO moto_stg_ms_tst_ref.stg_invoices
(
invoices_hkey
,customers_hkey
,lnk_invo_cust_hkey
,load_date
,src_bk
,load_cycle_id
,trans_timestamp
,operation
,record_type
,invoice_number
,invoice_customer_id
,invoice_number_bk
,business_key_bk
,invoice_date
,amount
,discount
,error_code
) WITH
distfkinv_cust AS
(SELECT distinct invoice_customer_id
FROM moto_ext_ms_tst_ref.ext_invoices)
,find_BK_fkinv_cust AS
( SELECT hub.business_key_bk
,stg.invoice_customer_id
FROM distfkinv_cust stg
JOIN moto_fl_tst_ref.sat_ms_customers sat
ON stg.invoice_customer_id = sat.customer_number
JOIN moto_fl_tst_ref.hub_customers hub
ON sat.customers_hkey = hub.customers_hkey
WHERE sat.load_end_date = to_date('31/12/2399','DD/MM/YYYY')
UNION
SELECT ex.business_key_bk
,ex.customer_number invoice_customer_id
FROM moto_ext_ms_tst_ref.ext_customers ex
JOIN distfkinv_cust dist
ON ex.customer_number = dist.invoice_customer_id
)
SELECT UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(r.invoice_number_bk),2)))) invoices_hkey
,UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(UPPER('moto_sales') || '#' || c.business_key_bk),2)))) customers_hkey
,UPPER(TO_CHAR(RAWTOHEX (DBMS_CRYPTO.HASH(TO_CLOB(r.invoice_number_bk || '#' || UPPER('moto_sales') || '#' || c.business_key_bk),2)))) lnk_invo_cust_hkey
,load_date
,UPPER('moto_sales') src_bk
,load_cycle_id
,r.trans_timestamp
,r.operation
,r.record_type
,r.invoice_number
,r.invoice_customer_id
,r.invoice_number_bk
,c.business_key_bk
,invoice_date
,amount
,discount
FROM moto_ext_ms_tst_ref.ext_invoices r
LEFT OUTER JOIN find_BK_fkinv_cust c ON (r.invoice_customer_id=c.invoice_customer_id);

An argument could be made that this Technical key could also be put into the Hub which results in one less lookup from the Satellite to the Hub. As the definition of the Hub is to only contain Business Keys it was our assumption that it was best to put this key in the Satellite but make it a special attribute that does not take part in the Change of History, meaning it will not be part of the calculation of the Hash Difference Attribute

Figure 3

 

As an additional Information we have added the target Data Vault model to this Article, it is shown on figure 3.

Dirk Vermeiren