Intro
A Data Vault (DV) model that doesn’t represent reality is useless.
Your data model needs to represent your business: it needs to contain the entities, attributes and relationships that are familiar to the people who work with you. For example, customers buy products, product purchases are billed to customers, companies store product inventory, etc.
Building a conceptual data model that accurately reflects your business has a lot of value. But don’t try to make it perfect because there’s no such thing as the ideal representation that everyone in the organization agrees on.
Put your conceptual model into practice by incorporating it into the physical Data Vault model. The physical DV model is the actual design blueprint for your relational database, including columns, column lengths, primary keys and foreign keys. It should be based on the actual data in your sources.
This article explains how to easily build a conceptual data model and then incorporate it into the physical DV model, using Ellie and VaultSpeed.
Ellie
Recently, I’ve been using ellie.ai to build conceptual data models. Ellie is an intuitive data modeling tool (visual diagramming tool) with enterprise-level data modeling and information architecture features. Moreover, it’s cloud-native, so it’s easy to set up and use, and to collaborate with your team.
Consider the business case of a car-motor-bike products store. This is how they categorized their offer:
Watch out for the fake Data Vault gap
It might be tempting to base yourself on the pure source metadata to build the DV model, but you would end up with a model that represents only one source and not your actual business. This anti-pattern is often referred to as “fake Data Vault”. Instead, we should now try to map this source structure to the business taxonomy. In VaultSpeed, you can very easily build towards that model using our various Data Vault modeling options such as hub groupings, multi-active satellites, many-to-many links, transactional links, etc...
It would be cool though if I could import the concepts from the conceptual model that I built in Ellie. That would save me some time setting things up in VaultSpeed. This is where API integration comes into play.
API integration
Building integrations between tools is made so much easier when you can use REST APIs. Fortunately, both Ellie and VaultSpeed have well-documented REST APIs in place. In this example, I’ve written a python script that automatically builds hub groups for all the core business concepts in my model. This provides me with a pre-filled canvas for hub groups, saving me the time it would have taken to create them manually in VaultSpeed.
Conclusion
There are 2 main criteria for building a good Data Vault model:
- automation to map source models to an integrated Data Vault model
- a conceptual data model to make sure that your Data Vault model accurately represents your business.
VaultSpeed’s ‘best of breed’ strategy encourages customers to integrate with other state-of-the-art tools to deliver results. In terms of conceptual data modeling, Ellie certainly fits the description.