dinsdag 21 februari 2012

Anchor modeling (Part I)

One of the datawarehouse concepts anchor modeling, is an interesting approach. In the past i've written some blogposts about creating some Datavault models based on AdventureworksLT database and i would like to implement an Anchor model based on this small database. So, in the coming months you can expect some more blogposts about this modeling technique. But before i'll do this i'll go deeper into the fundamentals of Anchor modeling. Most of the information in this blogpost can be found on www.anchormodeling.com.

Founders of the Anchor modeling are Olle Regardt and Lars Ronnback. Anchor Modeling is an agile database modeling technique suited for information that change over time both in structure and content. It provides a graphical notation used for conceptual modeling similar to that of Entity-Relationship modeling, with extensions for working with temporal data. The modeling technique is based around four modeling constructs: the anchor, attribute, tie and knot, each capturing different aspects of the domain being modeled. The resulting models can be translated to physical database designs using formalized rules. When such a translation is done the tables in the relational database will mostly be in the sixth normal form (Wikipedia)

Anchor Modeling has four basic modeling concepts:
  • Anchors. Anchors are used to model entities and events. This table only contains a surrogatekey. this table is comparable with the hub of the datavault model. But here is no need for a business key in the Anchor table in contrast with the datavault model.
  • Attributes.  
  • Ties. Ties model the relationships between anchors and this seems to be comparable with the Link table in Datavault.
  • Knots. Knots are used to model shared properties, such as states or gender..
Attributes and ties can be historized when changes in the information they model need to be kept.

Below an example of a model as used by the modeler:

Name convention
Below an explanation of the basic symbols that is used in Anchor modeling:
  •  This is the symbol for the Anchor. The naming convention is two letter mnemonic + Descriptor. For instance PE_Person, AD_Address or LO_Location.

  • This is the symbol for a Knot and the naming convention here is   three letter mnemonic + descriptor. Examples are GEN_Gender, STA_STATE, COU_Country

  • And this is the symbol for Attribute. Inherited Anchor mnemonic +  three letter mnemonic +  Inherited Anchor descriptor + descriptor. Examples are PE_SUR_Person_Surname.

  •  This is a symbol for Tie.   Two letter mnemonic  +  Anchor Descriptor +  Two letter mnemonic +   Anchor Descriptor

These are the basic symbols used by the modeler tool of www.anchormodeling.com.

It's too soon to draw some conclusions about the anchor modeling technique. This is modeling technique is currently under investigation by me. 


zondag 12 februari 2012

TOGAF and BI principles

During study of TOGAF i was pointed on the usage of principles. Principles are guidelines/rules that are defined before/while building an Enterprise Architecture and these principles are seldom changed. Otherwise why is a principle a principle, off course ;-). As BI can be a part of the Enterprise Architecture careful thoughts should be taken about BI and principles. What kind of BI principles do you favour?

TOGAF contains a couple rules and guidelines for use in applying TOGAF. One of these guidelines are architecture principles and these can be defined for the architecture being developed. These Architecture principles can be split up into three levels of principles: Enterprise, IT and Architecture.

How are principles defined? In TOGAF there is a standard way of describing principles. A name is given to the principle and each pricciple should have associated rationale and implication statements, for understanding and acceptance.

Example principle
Below an example of a principe that can be used in a Self Service BI project as i'm currently involved.

Name:Self Service BI
Statement:(Power)Users have access to information in such a way they can build reports themselves.
Rationale:The ability for (power) users to access their own reports and content lowers cost and backlog of the IT department..(Power)users know the meaning of information better than IT personal.
  • (Power) users should have reporting tools available.
  • (Power) users can build reports without intervention of IT.
  • Reports are published on a reportserver where they are managed by IT personal.
  • Users will be trained in building reports, Meaning of BI and the usage of the reporting server.

If you want BI to be integrated in a Enterprise Architecture there will be a need to develop BI principles and perhaps you'l lbe asked to define the principles that will hold for years. So if you're a BI department think about the principles that you want to achieve or you are achieving.


woensdag 8 februari 2012

SSAS: optimizing processing query using reference dimension relationship

In one of my former posts titled "Select Facts with Reference and Many2Many relationships" I blogged about the usage of reference dimension relationships in cubes. In my current clientsituation i'm experiencing a major slowdown while processing the cube. Some investigation learned that the reference relationsshiop seems to be the problem. In this blogpost i'll explain the problem, a solution (by replacing the load query) and a conclusion.

The problem 
I'll be using the labsituation that i've describedin my former blogpost "Select Facts with Reference and Many2Many relationships". In this blogpost i've shown how to build a reference dimension relationsship and a many2many relationsship. For this partiular problem i'll focus on the reference dimension  relationsship because this is experiencing performance slowdon at the client i'm currently operating. 

Suppose i've the following situation (see diagram below). In this situation i've three reference relationsships: Customer, Department and Date. These dimensions are the normal dimensions of the Orders fact. Because we want to influence the numbers on the OrderLines fact we have to create a Reference Dimension relationship.

The problem is that i'm experiencing a slowdown in performance during processing  the cube because of the usage of the reference dimension relationsship in the cube. This (the processing query) will result in the following load query of the cube:

[dbo_tblFactOrderlines].[OrderLines_Count] AS [dbo_tblFactOrderlinesOrderLines_Count0_0],
[dbo_tblFactOrderlines].[Product_ID] AS [dbo_tblFactOrderlinesProduct_ID0_1],
[dbo_tblFactOrderlines].[Category_ID] AS [dbo_tblFactOrderlinesCategory_ID0_2],
[dbo_tblFactOrderlines].[Orders_ID] AS [dbo_tblFactOrderlinesOrders_ID0_3],
[dbo_tblFactOrders_4].[Customer_ID] AS [dbo_tblFactOrdersCustomer_ID4_0],
[dbo_tblFactOrders_5].[Department_ID] AS [dbo_tblFactOrdersDepartment_ID6_0],
[dbo_tblFactOrders_6].[Date_ID] AS [dbo_tblFactOrdersDate_ID8_0]
FROM [dbo].[tblFactOrderlines] AS [dbo_tblFactOrderlines],
[dbo].[tblFactOrders] AS [dbo_tblFactOrders_4],
[dbo].[tblFactOrders] AS [dbo_tblFactOrders_5],
[dbo].[tblFactOrders] AS [dbo_tblFactOrders_6]
   ([dbo_tblFactOrderlines].[Orders_ID]   =  [dbo_tblFactOrders_4].[Orders_ID])
   ([dbo_tblFactOrderlines].[Orders_ID]   =  [dbo_tblFactOrders_5].[Orders_ID])
   ([dbo_tblFactOrderlines].[Orders_ID]   =  [dbo_tblFactOrders_6].[Orders_ID])

As you can see that for every "Reference Dimension" relationship an extra join is created to the same table (?!). The WHERE clause uses 3 joins for Customer, Department and Date. At my client i'm having 5 joins and it takes now about 15 minutes to load a specific measuregroup. When i replace the 5 joins with 1 join the processingtime drops dramatically, from 11 minutes to 35 seconds.

The solution
I've replaced the OrderLines tabel in the datasource view by the following view

DROP VIEW vwOrderLinesCube

CREATE VIEW vwOrderLinesCube AS
[OL].[OrderLines_Count] AS [OrderLines_Count],
[OL].[Product_ID] AS [Product_ID],
[OL].[Category_ID] AS [Category_ID],
[O].[Orders_ID] AS [Orders_ID],
[O].[Customer_ID] AS [Customer_ID],
[O].[Department_ID] AS [Department_ID],
[O].[Date_ID] AS [Date_ID]
FROM [dbo].[tblFactOrders] AS [O]
INNER JOIN [dbo].[tblFactOrderlines] OL ON [O].[Orders_id] = [OL].[Orders_id]

So i  had to draw some extra lines in the datasource view and that is a drawback of the solution, unfortunately. The model is now less intuitive and less clear. Too bad...

With this new setup the following processing query is executed:

[dbo_tblFactOrderlines].[OrderLines_Count] AS [dbo_tblFactOrderlinesOrderLines_Count0_0],
[dbo_tblFactOrderlines].[Product_ID] AS [dbo_tblFactOrderlinesProduct_ID0_1],
[dbo_tblFactOrderlines].[Category_ID] AS [dbo_tblFactOrderlinesCategory_ID0_2],
[dbo_tblFactOrderlines].[Orders_ID] AS [dbo_tblFactOrderlinesOrders_ID0_3],
[dbo_tblFactOrderlines].[Customer_ID] AS [dbo_tblFactOrderlinesCustomer_ID0_4],
[dbo_tblFactOrderlines].[Department_ID] AS [dbo_tblFactOrderlinesDepartment_ID0_5],
[dbo_tblFactOrderlines].[Date_ID] AS [dbo_tblFactOrderlinesDate_ID0_6]
FROM [dbo].[vwOrderLinesCube] AS [dbo_tblFactOrderlines]

Now you can see that the three joins has been replaced by the view, which contains now the one and only join to the Orders table.

Mainly this solution is needed when the processing time takes a long time because of the usage of the reference dimension relationsship in the cube design. In my opinion because of bad design of the load query of the cube. Too bad but sometimes you have to use the pragmatic solution.