Introduction
The power of VaultSpeed lies in the extended predefined templates that generate code based on a number of configuration parameters. If you work with a template-based solution, you'll have to solve these extended issues inside the templates you build. Additionally you'll have to test and maintain them.
In a series of blog articles, we want to address some complex challenges and how VaultSpeed provides a solution out of the box.
This article describes the challenges and the solution for the implementation of links between multi-active satellites with subsequence source attribute(s).
The challenge described in this article is twofold: first we need to define the level of detail for the link. Second, we also need to decide about what happens when a delete occurs on the source. Do we want the link record to be 'virtually' deleted or not?
What is a multi-active satellite ?
A multi-active satellite is a satellite that can have multiple active records for the same business key. There can be different types of multi-active satellites:
A multi-active satellite with subsequence source attribute(s):
In this case there is an attribute in the source object that will make the data in the satellite unique again. The combination of the business key(s), subsequence attribute(s) and load date will be the unique key of the satellite. A language code for example can be a subsequence attribute in a source system.
A multi-active satellite without a subsequence source attribute
This is a less common type. It means there is no unique source attribute. In this case we intodruce a generated sequence number per business key to ensure that the combination of the business key(s), the generated subsequence attribute and the load date is unique.
This article describes one of the scenarios possible when combining multi-active satellites with links. It explains how VaultSpeed’s logic solves these challenges as a standard functionality. Just by setting a few parameters.
The case uses data models describing a fictitious motor bike company, as commonly used in VaultSpeed demos or documentation.
The case
Sources
The challenges are explained using 2 multi-active objects within the VaultSpeed moto sales demo source. These objects are the product_features and product_features_category tables as shown in figure 1. You can see some example records in figure 2 for product features and figure 3 for product feature categories.
The language codes in both of the objects are the source subsequence attributes and there is a foreign key from the product feature object to the product feature category object based on the product_feature_cat_id attribute.
The source case has the following characteristics:
- The subsequence attribute is a source attribute
- In both tables the business key attribute (indicated in red) is not the same as the primary key attribute (indicated in blue).
It means that the relationship is not based on the business key but on other attributes. A lookup is
needed for building the link hash key as described in a previous blog on our website.
- The primary key is at a lower level of detail than the business key in both the source objects, product_feature and product_feature_cat
- The rows indicated in blue show the primary key, the rows indicated in red, contain the business key.
As you can see, for the same business key value you have multiple primary key values.
It means that for a lookup via the business key, multiple primary keys will be returned. The logic must be
able to cope with it.
- Both objects in the relationship are multi-active objects
- The product_feature object as well as the product_feature_cat object are multi-active objects. A lookup of a business key for the link can result in multiple records being returned.
Setup in VaultSpeed
The setup of the source and the raw Data Vault for the above implementation of source and Data Vault is very simple and requires changing 1 parameter, setting 1 switch per object and indicating 1 subsequence attribute per source object.
- Build the raw Data Vault using insert only logic.
VaultSpeed system parameter : INSERT_ONLY_LOGIC = Y - The product_features and product_features category source object are multi-active.
Set VaultSpeed switch multi-active to 'Yes' using the graphical source editor. This indicates that the product feature source object is a multi-active satellite (figure 4).- By right clicking on the product feature and product feature class language attributes you can set their
attribute types to 'subsequence attribute' to indicate that that they are a source based subsequence
attribute (figure 5).
Links between multi-active objects
We encounter a number of specific choices and challenges that need to be covered by the loading logic for the links between multi-active objects:
Subsequence in link or satellite on link?
Link
No: the Subsequence Attribute is a part of the Key of the satellite, but it is not a part of the business key. So when building the link you just load the link object with the unique combinations of the hash on the business keys of the relationship.
Satellite on link
Here you could decide to add the subsequence number to the satellite on link. In our example, we created a link between 2 hubs with multi-active satellites.
In this case, if you take the subsequence attribute into account in the satellite on hub you would get the cartesian product of all the combinations of all the subsequence attributes values of a link key as shown in figure 6. The example case would have been even more compelling if the multi-active data had different subsequence attributes.
Considering this, VaultSpeed has chosen not to include the subsequence attributes in the satellite on link.
What about the delete logic?
When no subsequence attribute is present in the satellite on link, some special management is required when handling deletes on the link satellite.
A delete flag for the satellite on link can be placed to ‘Y’ only if all the multi-active records have been deleted. So not when only some of the multi-active records are being deleted.
We need to build a join with the satellite that owns the data at the foreign key side of the relationship to ensure that all the records are really deleted. Only then can the delete record with delete_flag = Y be inserted in the satellite on link (This insert is because of insert only logic.)
,dist_del_stg as
(
SELECT
distinct product_features_hkey
FROM moto_stg_ms_tst_ref_scn1.stg_product_features stg1
WHERE operation = 'D'
)
dist_del_stg selects the delete records arriving from staging. They are used to limit the staging content and the satellite content to the new records that have deletes.
,find_del_and_related_stg_rec as
(
SELECT
stg1.product_features_hkey,
stg1.pro_fea_lan_code_seq,
stg1.load_date,
stg1.load_cycle_id,
stg1.trans_timestamp,
stg1.operation,
stg1.error_code_fkprfe_pfca
FROM moto_stg_ms_tst_ref_scn1.stg_product_features stg1
JOIN dist_del_stg dds
ON dds.product_features_hkey = stg1.product_features_hkey
)
,find_del_related_sat_rec as
(
SELECT
sat.product_features_hkey,
sat.pro_fea_lan_code_seq,
sat.load_date,
sat.load_cycle_id,
sat.trans_timestamp,
sat.delete_flag,
sat.load_end_date,
max(load_end_date)
over (partition by sat.product_features_hkey,
sat.pro_fea_lan_code_seq
order by sat.load_date)
as max_load_end_date
FROM moto_fl_tst_ref_scn1.sat_ms_product_features sat
JOIN dist_del_stg dds
ON dds.product_features_hkey = sat.product_features_hkey
LEFT OUTER JOIN moto_stg_ms_tst_ref_scn1.stg_product_features stg1
ON stg1.product_features_hkey = sat.product_features_hkey
AND stg1.pro_fea_lan_code_seq = sat.pro_fea_lan_code_seq
AND stg1.load_date = sat.load_date
WHERE stg1.product_features_hkey is NULL
AND stg1.pro_fea_lan_code_seq IS NULL
AND stg1.load_date IS NULL
AND sat.delete_flag = ‘N’
)
,find_active_del_related_sat_rec as
(
SELECT
sat.product_features_hkey,
sat.pro_fea_lan_code_seq,
sat.load_date,
sat.load_cycle_id,
sat.trans_timestamp,
sat.delete_flag,
sat.load_end_date
FROM find_del_related_sat_rec sat
WHERE sat.load_end_date = sat.max_load_end_date
)
not_deleted_record then calculates how many active records there still are in the satellite based on the incoming data with delete records.
The full outer join is very important as there could be insert records after the delete in staging that need to be taken taken into account.
The logic of the delete flag that needs to be used in further logic is shown below.
CASE WHEN COALESCE (NOT_DELETED_RECORDS_LKS.count_existing_active,0) = 0 AND stg.operation = 'D' THEN 'Y'::text ELSE 'N'::text END delete_flag
If all the incoming records match the still active records in the satellite, then the count = 0 and the delete flag of the satellite on link can be set equal to 'Y'. Otherwise the count > 0 and then the delete_flag will be set to 'N'.
Conclusion
Through some simple parameterization and a few manipulations inside the graphical source editor in VaultSpeed you get the full support of this feature in the raw Data Vault layer of your integration system.
The choices made in this setup and their logical consequences were carefully analyzed and all necessary logic to support all possible cases were subsequently implemented inside the standard templates of VaultSpeed.
The complexity is hidden for customers and delivers high value as well as easy, out-of-the-box configuration and implementation.