When the source relationship is not based on the business keys
Note: This is a fictitious example, created only to illustrate the more advanced logic that is necessary in the staging layer. It explains how to solve the relationship and load the link table when the foreign/primary key relationship between the source objects is not based on the business key.
The relational source model as shown in the image above shows the relationship between an invoice source table and a customer table.
The business keys are indicated by the 'U' (unique) before the column. If they do not exist, then the business key is indicated by the U key. In this model, this means:
- For invoices, the business key is invoice_number
For customers, the business key is a composite key that consists of these columns:- FIRST_NAME
- LAST_NAME
- BIRTH_DATE
- GENDER
The relationship between the 2 tables in this model is based on the customer_id, which is the technical key (surrogate key). It is not based on the business key, meaning the combination of first_name, last_name, birth_date and gender.
As a result:
- The link table can’t be loaded by using the invoices table only – where the relationship exists. A lookup is needed to find the business keys in the customers table.
- Not all the records to solve the relationships will be found during the lookup because the data in the landing area holds incremental data. This means that a lookup in the customer hub is necessary to find the missing related records and their business keys.