In addition to these popular options for traditional data warehousing, organizations have adopted other approaches; in practice, they are often just an unmanaged, rampant mix of the above options with additional, free-style modeled entities.
Issues with third normal form
As already stated, Inmon’s approach is based on an enterprise model that is modeled after the operational corporate data in 3NF with effectivity timelines. This model can handle a high volume of tightly integrated data in a highly structured data model with many-to-many linkages. It is relatively easy to extend and can process near real-time loads.
However, the 3NF model was originally not intended for use in data warehousing. It was modified for this new purpose through the addition of effectivity timelines. But the added timeline also adds additional
complexities to the data model, especially the potential of having joins across timelines.
But there are bigger issues. Notably, when loading the model, the entities must be loaded in a certain order, driven by the business, creating dependencies that lead to cascading changes if any part needs to be modified. These dependencies can become quite a burden, particularly in larger enterprise models.
For example, the organization must be loaded before loading employees, because the organization is referenced by the employee to indicate the employer. If the organization is not known by the organizational entity, loading the employee will fail. If the organization entity needs to be modified, the employee entity must first be at least reviewed and tested, but potentially modified as well. But if the employee entity is touched, the salary payments entity must be reviewed, and so on.
As a result, most organizations try to prevent modifications to the model in the first place, but that often leads to a big-bang approach to the enterprise model: first, build the whole enterprise model before implementing reports on top. This approach is not very conducive to agile development and it becomes difficult to model the enterprise as the enterprise is constantly changing.
Another issue with the 3NF model is that it is not well supported by business intelligence frontends or is hard to use for business analysts.
Issues with the federated star schema
This is where the federated star schema comes into play. It’s based on a simple star schema with facts and dimensions and therefore easy to use in business intelligence solutions. It easily supports multidimensional analysis of the information and provides subject-oriented answers with aggregation points included.
Another advantage is that its subject orientation can facilitate development and deployment. Business users can easily define an information requirement; development can focus on this specific requirement without needing to analyze the wider enterprise model with data not relevant for this requirement.
But the federated star schema also has its shortcomings. While the individual star schemas are integrated using conformed dimensions, this doesn’t replace an integrated enterprise information model.
Moreover, the model is driven by the information requirements of the business user. All new information is added by additional, subject-oriented star schemas. But in many cases, information requirements are not so clear, especially where business users want to broadly explore the data available from the source systems. In this case, business users would like to use a data model closer to the source schema, such as, operational data stores.
Another important shortcoming is the handling of real-time data. Because star schemas are often pre-aggregated, the ingestion and integration of real-time messages are limited due to the required aggregation on the fly. This issue is exaggerated when dimension records arrive late, for example when the web page visit is known before the details of the web page visitor arrive. Aggregations (for example by visitor’s country, or page content) in this case cannot be performed (yet). This leads to delays in information delivery or worse.
It should be clear by now that the star schema is great for information delivery, as it was designed for this purpose. However, the data warehousing aspect of the EDW is where the approach fails because it was never meant for that.
This is where Data Vault modeling as the alternative comes into play.