Normally you create one 'empty' row in your dimension called 'Unknown' (for handling NULLs and false dimension key lookups) and this a good solution under normal circumstances.But as said in my introduction, a more refined solution is needed. In my opinion the following distinction can be made:
- -2 , "Not available" like in NULL. There is no value present in the source. In case of date dimension use a value like 1898, 1899 or 2098,2099 as a default value for unknown and/or not avaliable dates. In some sourcesystems i've seen that a '' (two single quotes) is implemented. You could include this check in the same logic for the "Not available" code too.
- -1 , "Unknown" when the application key is not present in the dimension application lookup field. In this case there is a value present but it is not (yet) present in the dimension.
- When not filled in, use 0 because this won't bother aggregations like summing.
- Use an extra indicatorfield <field>_AvailableYN to indicate whether the measure is available or not (NULL). Create a DD dimension and drop this field in here.
When the column in the sourcetable is NULL it will insert a -2 into the fact table. When a value is present in the column in the source but the dimension lookup failed an unknown value (-1) is inserted in the fact table.
Powerusers should be aware of these choices when they build reports.If they are used to 'UNKNOWN' values, now they have include some logic for the NOT AVAILABLE values in the reports too.
An interesting article about dummy values can be downloaded here