Dimensional Modelling for Effective Data Analysis
Dimensional modelling is a crucial concept in data analysis that involves structuring and organizing data into dimensions and facts to facilitate interactive querying. OLAP tools play a vital role in this process by allowing managers and analysts to explore data from various perspectives easily. With the ability to quickly choose and study specific facts and dimensions, users can gain insights and make informed decisions efficiently.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Presentation: Dimensional Modelling 1 Erik Perjons DSV, Stockholm University
The Concept of Cube and Towards Dimensional Modelling
OLAP tools OLAP tools supports so-called interactive and "ad-hoc querying", that is, systems that allow managers and analysts to ask questions that appear in their minds during interaction with the system - without the need to use SQL and other query languages.
OLAP tools The main idea of OLAP is that managers and analysts can interactively, quickly and easily choose facts that they want to study, such as sales figures or costs. They can also choose from what views they want to study these facts, that is, from what dimensions. Then they can add new or subtract existing dimensions and thus be able to study the facts, sales figures or cost, from other perspectives.
OLAP tools: User interface Product group Region Quarter Sales (in kSEK) Quarter 1- 2018 Group AB Region ABC 300 Group AB Region DEF Quarter 1- 2018 1 300
OLAP tools: User interface TO DO: Drill-down (for example, by click on Group AB ) in order to get detailed info about Group A och Group B Product group Region Quarter Sales (in kSEK) Quarter 1- 2018 Group AB Region ABC 300 Group AB Region DEF Quarter 1- 2018 1 300
OLAP tools: User interface DONE: Drill-down (for example, by click on Group AB ) in order to get detailed info about Group A och Group B Product Group Region Quarter Sales (kSEK) Quarter 1 - 2018 Group A Region ABC 100 Group A Group B Region DEF Region ABC Quarter 1 - 2018 Quarter 1 - 2018 400 200 Group B Region DEF Quarter 1 - 2018 900
OLAP tools: User interface TO DO: Add another column (e.g., add Product to Excel) in order to get more detailed info) Year Region Sales (kSEK) Product Product Group Quarter Office Cost (kSEK) Salesperson Month Number of transactions TO DO: Add another column (e.g., add Product to Excel) in order to get more detailed info) Product Group Region Quarter Sales (kSEK) Quarter 1 - 2018 Group A Region ABC 100 Group A Group B Region DEF Region ABC Quarter 1 - 2018 Quarter 1 - 2018 400 200 Group B Region DEF Quarter 1 - 2018 900
OLAP tools: User interface DONE: Add another column (e.g., add Product to Excel) in order to get more detailed info) Year Region Sales (kSEK) Product Product Group Quarter Office Cost (kSEK) Salesperson Month Number of transactions DONE: Add another column (e.g., add Product to Excel) in order to get more detailed info) Product Group Product Region Quarter Sales (kSEK) Quarter 1- 2018 Grupp A Grupp A Grupp A Region ABC 0 Product 20011 Product 20011 Product 20012 Product 20012 Region DEF Region ABC Quarter 1- 2018 Quarter 1- 2018 10 12 Grupp A Region DEF Quarter 1- 2018 45
OLAP tools: User interface Year Region Sales (kSEK) Product Dimensions/ Perspectives Product Group Quarter Office Cost (kSEK) Facts/ Measures Salesperson Month Number of transactions Product Group Product Region Quarter Sales (kSEK) Quarter 1- 2018 Grupp A Grupp A Grupp A Region ABC 0 Product 20011 Product 20011 Product 20012 Product 20012 Region DEF Region ABC Quarter 1- 2018 Quarter 1- 2018 10 12 Grupp A Region DEF Quarter 1- 2018 45
Cube a multidimensional view on data 130 2 300 Q4-2018 5 024 Q3-2018 200 Q2-2018 Q1-2018 B D A C region
Cube a multidimensional view on data Dimensions/ Perspectives 130 Facts/ Measures 2 300 Q4-2018 5 024 Q3-2018 200 Q2-2018 Q1-2018 B D A C Dimensions/ Perspectives region
Cube a multidimensional view on data Long term promotion campaign Go-to-TV Region Region Region Go-to-newspapers Region Region Region Customer Group Customer Group A Customer Group B Customer Group C
How to represent the multidimensional cube as a conceptual model? Product Group 1 1 Quarter 1..* 1..* Fact/Measure 1..* 1..* 1..* 1 1 1 Region Customer Group Long Term Promotion Campaign
Cube a multidimensional view on data The cube is a concept that has influenced: 130 user interface in OLAP tools 2 300 Q4-2018 database design, logical structure (star schema, dimensional modellering) 5 024 Q3-2018 physical storage ( Dimensional models implemented in multidimensional database environment (MOLAP) are refererred to as OLAP cubes ) 200 Q2-2018 Q1-2018 B D A C region
Dimensional modelling/Star schema Service Dimension 1 Date Dimension 1 1..* 1..* Service Usage Fact Table 1..* 1..* 1 Seller Dimensions 1 Customer Dimension
Dimensional modelling/Star schema Date Dimension Service Dimension date (PK) month quarterOfYear year 1 1 serviceKey (PK) serviceName serviceGroup 1..* 1..* Service Usage Fact Table 1..* serviceKey (PK) (FK) sellerKey (PK) (FK) customerKey (PK) (FK) date (PK) (FK) sum noOfCalls Customer Dimension customerKey (PK) postadress region incomeGroup 1..* Seller Dimension 1 sellerKey (PK) sellerName office 1
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP 1 Service Usage Fact Table 1 Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 1..* 1..* 1..* 1..* Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista 1 1
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Service Usage Fact Table Number of calls 3 1 1 1 1 Query: For how much did customers in Sthlm use service Local call in october 1999? Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 = = Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP Service Usage Fact Table Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Denormalized dimensions that is, they are not in 3 NF What is 3NF? Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP Service Usage Fact Table Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 If possible, organize attributes in hierarchies Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP Service Usage Fact Table Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP Service Usage Fact Table Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Key S1 S2 S3 S4 Service Local call Intern. call Group A SMS WAP Service group Group A Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Group B Group C Service Usage Fact Table Understand the grain of the fact table and how it will be impacted by dimensions added and deleted. Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Grain is the level of detail at which data is represented Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP What happened if we replace the Date dimensions with a Month dimension instead (where one row represent a month and not a date)? Service Usage Fact Table Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP Service Usage Fact Table Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 How can you create aggregated dimensions based on the Customer dimension? Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista 1
Service Dimension Date Dimension Key S1 S2 S3 S4 Service Local call Intern. call Group A SMS WAP Service group Group A Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Group B Group C Service Usage Fact Table Number of calls 3 1 1 1 1 90 percent of the total space in the database consist of fact tables data. Consequences? Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Key S1 S2 S3 S4 Service Local call Intern. call Group A SMS WAP Service group Group A Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Group B Group C Service Usage Fact Table Number of calls 3 1 1 1 1 90 percent of the total space in the database consist of fact tables data. Therefore fact tables need to be sparse avoid textual descriptions Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Key S1 S2 S3 S4 Service Local call Intern. call Group A SMS WAP Service group Group A Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Group B Group C Service Usage Fact Table Number of calls 3 1 1 1 1 Textual descriptions belongs in the dimensional tables Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Service group Group A Group A Group B Group C Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Key S1 S2 S3 S4 Service Local call Intern. call SMS WAP Service Usage Fact Table Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Key S1 S2 S3 S4 Service Local call Intern. call Group A SMS WAP Service group Group A Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Group B Group C Service Usage Fact table Number of calls 3 1 1 1 1 You can access the fact table via the dimensions tables. The dimensions are the entry points to the data, that is, entry points for approaching the fact table Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Key S1 S2 S3 S4 Service Local call Intern. call Group A SMS WAP Service group Group A Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Group B Group C Service Usage Fact table The database engine can make strong assumptions - by first constraining the heavily indexed dimensional tables, and then attacking the fact table all at once with the Cartesian product of the keys in the dimensional tables Number of calls 3 1 1 1 1 Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Key S1 S2 S3 S4 Service Local call Intern. call Group A SMS WAP Service group Group A Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Group B Group C Service Usage Fact table Number of calls 3 1 1 1 1 The simplicity and symmetry of the star schema makes it easy to understand and navigate Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Service Dimension Date Dimension Key S1 S2 S3 S4 Service Local call Intern. call Group A SMS WAP Service group Group A Date/ Key 991011 991012 Month 9910 9910 Quarter 4 - 99 4 - 99 Year 99 99 Group B Group C Service Usage Fact table Number of calls 3 1 1 1 1 Star schemas are gracefully extensible to accommodate change Sum 25:00 05:00 89:00 12:00 08:00 C210 C210 C212 C213 C214 S1 S3 S2 S1 S4 F11 F11 F13 F13 F13 991011 991011 991011 991011 991012 Customer Dimension Seller Dimension Income group B B C A A Key C210 C211 C212 C213 C214 Customer Anna N Lars S Erik P Danny B sa S Address Stockholm Malm R ttvik Stockholm Stockholm Region Stockholm Sk ne Dalarna Stockholm Stockholm Key F11 F12 F13 Seller Anders C Lisa B Janis B Office Sundsvall Sundsvall Kista
Dimensional modelling/Star schema Date Dimension Service Dimension date (PK) month quarterOfYear year 1 1 serviceKey (PK) serviceName serviceGroup 1..* 1..* Service Usage Fact Table 1..* serviceKey (PK) (FK) sellerKey (PK) (FK) customerKey (PK) (FK) date (PK) (FK) sum noOfCalls Customer Dimension customerKey (PK) postadress region incomeGroup 1..* Seller Dimension 1 sellerKey (PK) sellerName office 1
Towards Dimensional Modelling A single fact table Facts, measurements KPI s Tables for each dimension The dimension tables are denormalized Kimball approach