When the source relationship is not based on the business keys

Note: This is a fictive example, created to explain a case. It is used only to illustrate the more advanced logic that is necessary in the staging layer. It explains how to solve the relationship and load the link table when the foreign/primary key relationship between the source objects is not based on the Business Key.

The relational source model as shown in the image above shows the relationship between an Invoice source table and a Customer table.

The Business Keys are indicated by the U-sign in front of the column. If they do not exist, then the Business Key is indicated by the Unique key. In this model, this means:

  • For invoices, the Business Key is INVOICE_NUMBER
  • For customers, the Business Key is a composite key that consists of these columns:
    • FIRST_NAME
    • LAST_NAME
    • BIRTH_DATE
    • GENDER

The relationship between the 2 tables in this model is based on the customer_id, which is the technical key (surrogate key). It is not based on the business key, meaning the combination of First_name, Last_name,Birth_date and Gender.

As a result:

  • The link table can’t be loaded by using the Invoices table only – where the relationship exists. A lookup is needed to find the Business Keys in the Customers table.
  • Not all the records to solve the relationships will be found during the lookup because the data in the landing area holds incremental data. This means that a lookup in the Customer Hub is necessary to find the missing related records and their Business Keys.

The following image illustrates the lookups that are needed to resolve the link between Invoices and Customers. In the Invoices landing table there are two incremental records which have 5 and 7 as the customer-id. When the lookup is done in the Customers landing table it only holds the record with the Customer_Number = 5, meaning that the record with Customer_Number = 7 is not in the landing area during this incremental load.

The only way to solve this is to do the lookup to the Satellite, on the condition that the Satellite holds the primary key of the Customers table = Customer_Number. Through the lookup to the HUB using the Hash key, the Business Keys can be found resulting in a fully solved Link table.

The following image shows the SQL used to load the invoices Staging table and used to resolve the Link table between Invoices and Customers in an Oracle Database prior to 12C.

Now what if this technical key was also put into the Hub? This would result in one less lookup from the Satellite to the Hub. However, Hubs should only contain Business Keys. It is good practice to put the technical key in the Satellite but make it a special attribute that does not take part in the Change of History. This means it will not be a part of the Hash Difference Attribute calculation. 

The target Data Vault model for this article is shown below