dinsdag 18 januari 2011

Linstedts videos and his first bookchapters about Datavault

On Learndatavault.com it's possible to download some chapters about Dan Linstedts new book. It's also possible to see some videos about the datavault method and modeling. In this post I'll outline the most intersting subjects that he introduced or are new (in my opinion).

Ontology
One interesting issue is that Linstedts introduces ontologies. Wikipedia defines Ontology "as the philosophical study of the nature of being, existence or reality as such, as well as the basic categories of being and their relations. [...]Ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences". Initially i didn't quite understand why Linstedt uses this terminology, but when i visited Wikipedia and saw this picture, a resemblance with DV is there.
Linstedts says that learning warehousing, applying and using ontologies is a critical succes factor for handling, managing and applying unstructured data to a structured datawarehouse. I asked him some questions about ontologies:
  1. Is a ontology model a logical model? And DV a physical model?
  2. What does the ontology model add more than the existing model techniques?
  3. Why did you mention that ontology is a critical succes factor for building a datawarehouse in your first chapter?
Here are the answers from Dan:
1)  "In a way, yes, I see different ontologies as different logical models, the DV is “most often” a Physical model. The ontology model adds understanding of the terminology. It provides IT with a way to communicate the model to business users, and to get “order of importance” from business users. You see, business users fight over the definitions of their terms, not just because they USE them differently, but also because they each have a “different ontology” in their heads to represent the data. If you can show a common consistent reference model (ontology), then the level of understanding by business users greatly increases, and there is less arguing about how to define master data. Just remember: there are many different ontologies that can be applied to the same physical model."

2) "I relate ontologies/taxonomies to the Terminology or metadata we use to build data models. If you understand your business terms, or concepts, you can use those to create the hierarchies based on Importance and Classification. These metadata (business terms) are often what we call a logical model when we build a data model. On the other hand, since the Data Vault is based on Business Keys for a start, you can more effectively get the business keys for each conceptual layer in the hierarchy that you build."

3) "I say critical success factor because of the impact it has on communication between IT and business stakeholders. If the stakeholders understand WHAT your building, they can more easily buy-in. They will feel more comfortable, and will usually no longer care HOW it’s built (ie: DV model) – as long as you’re flexible in the near future. Using an ontological representation of the model helps IT with transparency of the DV project. it also can serve as a modeling guide for what data is available to reports and so on. I’ll write up an example in the near future, as that would make another great video.
I did not mean to state that a DW project will fail without ontologies, maybe I used the wrong term to describe them. What I really meant is what I stated above, an increase in the chance of success if your money holders understand what your building, and you give them a way to think about it."
As i already said there is a resemblance between ontologies/taxanomies and datavault, but what i would like to see some real world examples how these two models are related.

The steps to be taken to build succesful datavault models
In Linstedts videos he explains the following steps that need to be taken when you are developing a Datavault model:
  1. Identify the hubs. The hubs are used for tracking and identifying key information
  2. Identify the master data setup. Linstedt says that you have to discover the hierarchy in your business keys with an ontology in order to learn the dependiencies of the dataset. There are a lot of views of ontologies of the business possible.
  3. Identify transactions and relationships to model link structures. Links are associations or transactions.
  4. Identify and model your satellites based on frequency of data change.
This is in contrast with the TDAN articles of Dan Linstedt, published earlier :
  • Model the hubs
  • Model the links
  • Model satellites
  • Remodel the satellites (Monster/mini)
It seems there is shuffle between the steps. The interesting step that catches my eyes is step 2 in the new program (identify the master setup). This issue has a relation with the first point in this post: ontology. Step 2 is about ontologies (Master data and their relations to each other).

MetaFields
From what i've seen from the videos there are a couple of metafields available in every table. As it seems there are some little changes in the way Datavault deals with metafields. The standard as 'defined' (where?) until the videos of Dan Linstedt was that there were a couple of metafields available:
  • Load date source (LDTS).
  • Record source (RSRC).
As it seems now Dan added a new metafield : Load Enddate Datasource (LEDTS). This fields ends the validity of a record. It means that a new record is added to the table and the old record is obsolete. In a discussion i've read there are also more meta fields discussed in the courses of Datavault (hope to join one very soon) and these are:
  • Last seen datetime. This would indicate whether a source are 'drying up'. Not sure what this means exactly. Hope to find out soon.
  • Event datime. This is the data of the actual change in the source (different from LDTS).
In the datawarehouses i've developed so far, i've add the following metafields (and some not):
  • Creationdate (same as LDTS).
  • Expiration date (same as LEDTS).
  • Deletiondate (only possible with CDC/triggers or full load).
  • Current Yes/No.
  • Versionnumber (this indicates whther how much times a satellite changed).
Just some thoughts...

Greetz,
Hennie

Geen opmerkingen:

Een reactie posten