Data Warehousing and Online Analytical Processing (OLAP)

undefined
Data Warehousing
COMP3017 Advanced Databases
Dr Nicholas Gibbins – nmg@ecs.soton.ac.uk
2013-2014
Processing Styles – OLTP
2
On-Line Transaction Processing
Traditional workloads,
bread and butter
 processing
Volumes of data, transactions grow, networks getting larger.
Processing Styles – OLAP
3
On-Line Analytical Processing
includes the use of data warehouses
multidimensional databases
data analysis
Online Analytical Processing
4
OLAP is the name given to the
dynamic enterprise analysis
required to create, manipulate,
animate and synthesise
information from exegetical,
contemplative and formulaic data
analysis models
Online Analytical Processing
5
OLAP is the name given to the
dynamic enterprise analysis
required to create, manipulate,
animate and synthesise
information from 
exegetical
,
contemplative and formulaic data
analysis models
Exegesis: critical explanation
How did we get to where we are?
Online Analytical Processing
6
OLAP is the name given to the
dynamic enterprise analysis
required to create, manipulate,
animate and synthesise
information from exegetical,
contemplative
 and formulaic
data analysis models
Asking ‘what if?’ questions
How does the outcome change if 
we vary the parameters?
Online Analytical Processing
7
OLAP is the name given to the
dynamic enterprise analysis
required to create, manipulate,
animate and synthesise
information from exegetical,
contemplative and 
formulaic
data analysis models
Which parameters must be varied
in order to achieve a given outcome?
1.
Multidimensional conceptual
view
2.
Transparency
3.
Accessibility
4.
Consistent reporting
performance
5.
Client-server architecture
6.
Generic dimensionality
7.
Dynamic sparse matrix
handling
8.
Multi-user support
9.
Unrestricted cross-dimensional
operations
10.
Intuitive data manipulation
11.
Flexible reporting
12.
Unlimited dimensions and
aggregation levels
12 Rules for OLAP
8
Data Mining
9
Data mining 
is the process of discovering hidden patterns
and relations in large databases using a variety of advanced
analytical techniques
Data mining attempts to use the computer to discover
relationships that can be used to make predictions
Data mining tools often find unsuspected relationships in
data that other techniques will overlook
Data Mining Approaches
10
Rule-based analysis
Neural networks
Fuzzy Logic
K-nearest-neighbour
Genetic algorithms
Advanced visualisation
Combination of any of the above
11
A 
data warehouse 
is a subject-oriented, integrated,
time-variant, non-volatile collection of data that is used
primarily in organisational decision making
The Data Warehouse
12
A 
data warehouse 
is a 
subject-oriented
, integrated,
time-variant, non-volatile collection of data that is used
primarily in organisational decision making
The Data Warehouse
The data is organised according to subject instead of
application and contains only the information necessary for
‘decision support
processing.
13
A 
data warehouse 
is a subject-oriented, 
integrated
,
time-variant, non-volatile collection of data that is used
primarily in organisational decision making
The Data Warehouse
Data encoding is made uniform
(e.g. sex = f or m, 1 or 2, b or g - needs to be all the same in
the warehouse).
Data naming is made consistent.
14
A 
data warehouse 
is a subject-oriented, integrated,
time-variant
, non-volatile collection of data that is used
primarily in organisational decision making
The Data Warehouse
Data is collected over time and can then be used for
comparisons, trends and forecasting
15
A 
data warehouse 
is a subject-oriented, integrated,
time-variant, 
non-volatile 
collection of data that is used
primarily in organisational decision making
The Data Warehouse
The data is not updated or changed once in the data
warehouse, but is simply loaded, and then accessed.
The data warehouse is held quite separately from the
operational database, which supports OLTP.
Why a Separate Data Warehouse?
16
Performance 
 
Operational databases are optimised to support known transactions
and workloads
Special data organisation, access methods and implementation
methods are needed
Complex OLAP queries would degrade performance for operational
transactions
Why a Separate Data Warehouse?
17
Missing data
Decision support requires historical data, which operational
databases do not typically maintain
Data consolidation
Decision support requires consolidation (aggregation,
summarisation) of data from many heterogeneous sources, including
operational databases and external sources
Data quality
Different sources typically use inconsistent data representations,
codes and formats, which have to be reconciled
Extracting Data
18
Car
Policy
Claim
OPERATIONAL
DATA WAREHOUSE
Customers
replace
change
insert
change
replace
insert
Liability
Risk
EIS
DSS
Analysis
Etc
- Data is cleansed
- Data is restructured
The Data Warehouse
19
A Data Warehouse may be realised:
via a front end to existing databases and files
in a fresh relational database
in a multidimensional database (MDDB)
in a proprietary database format
using a mixture of the above
The Data Warehouse
20
Data may be accessed in various ways:
Decision Support Systems (DSS)
Executive Information Systems (EIS)
Data Mining
On-Line Analytical Processing
Data Marts
21
A data mart focuses on
only one subject area, or
only one group of users
An organisation can have
one enterprise data warehouse
many data marts
Data marts do not contain operational data
Data marts are more easily understood and navigated
Multidimensional Analysis
22
Need to examine data in various ways
Produce views of multidimensional data for users:
Slice
Dice
Pivot
Drill down
Roll up
Multidimensional Analysis – Slice
23
Operator
Performance
Time
One operator's
performance
over time
Overall performance
on a particular day
Overall performance
for one criterion
over time
Train Performance
 -  3 dimensions
 
  - Operators
  - Performance
  - Time
Multidimensional Analysis – Dice
24
10
50
20
12
15
10
Juice
Cola
Milk
Cream
Toothpaste
Soap
1    2    3    4    5    6
Month
N
W
S
Region
Product
Multidimensional Analysis – Pivot
25
10
50
20
12
15
10
Juice
Cola
Milk
Cream
Toothpaste
Soap
1    2    3    4    5    6
Month
N
W
S
Region
Product
Product
Month
Region
Multidimensional Analysis – Drill Down
26
10
50
20
12
15
10
Juice
Cola
Milk
Cream
Toothpaste
Soap
1    2    3    4    5    6
Month
N
W
S
Region
Product
by brand
Multidimensional Analysis – Roll Up
27
80
37
Beverages
Toiletries
1    2    3    4    5    6
Month
N
W
S
Region
Product
Internal Aspects
28
Schemas
Star schema
Snowflake schema
Fact constellation schema
Aggregated data
Specialised indexes
Bit map indexes (see lecture on multidimensional indexes)
Join indexes
Specialised join methods
Star Schema
29
Time Code
Quarter Code
Quarter Name
Date
Month Code
Month Name
Day Code
Day of Week
Season
Geography Code
Region Code
Region Manager
City Code
City Name
Post Code
Account Code
Key Account Code
Key Account Name
Account Name
Account Type
Account Market
Product Code
Product Name
Brand Manager
Brand Name
Prod Line Code
Prod Line Name
Prod Line Mgr
Product Name
Product Colour
Product Model No
Geography Code
Time Code
Account Code
Product Code
Sterling Amount
Units
Time
Sales
Account
Product
Fact Tables
 
key columns joining fact table
to the dimension tables
 
numerical
measures
30
Part of a Snowflake Schema
31
Time Code
Year Code
Quarter Code
Month Code
Day Code
Product Code
Prod Line Code
Brand Code
Geography Code
Time Code
Account Code
Product Code
Sterling Amount
Units
Time
Sales
Product
Quarter Code
Quarter Name
Quarter
Month Code
Month Name
Month
Day Code
Day of Week
Season
Day
Product Name
Product Colour
Product Model No
Product Desc
Brand Code
Brand Manager
Brand Name
Brand
Prod Line Code
Prod Line Name
Prod Line Mgr
Product Line
Data Warehouse Databases
32
Relational and Specialised RDBMSs
Specialised indexing techniques, join and scan methods
Relational OLAP (ROLAP) servers
Explicitly developed to use a relational engine to support OLAP
Include aggregation navigation logic, the ability to generate multi-
statement SQL, and other additional services
Multidimensional OLAP (MOLAP) servers
The storage model is an n-dimensional array
May use a 2-level approach, with 2-D dense arrays indexed by B-Trees
Time is often one of the dimensions
Slide Note
Embed
Share

Exploring the concepts of data warehousing and OLAP, this content covers topics such as OLTP, OLAP processing styles, and the role of OLAP in dynamic enterprise analysis. It also delves into the critical explanations and scenarios of online analytical processing, providing insights into how to achieve desired outcomes through varying parameters and the rules for effective OLAP implementation.

  • Data Warehousing
  • OLAP
  • Online Analytical Processing
  • OLTP
  • Multidimensional Databases

Uploaded on Oct 04, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins nmg@ecs.soton.ac.uk 2013-2014

  2. Processing Styles OLTP On-Line Transaction Processing Traditional workloads, bread and butter processing Volumes of data, transactions grow, networks getting larger. 2

  3. Processing Styles OLAP On-Line Analytical Processing includes the use of data warehouses multidimensional databases data analysis 3

  4. Online Analytical Processing OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models 4

  5. Online Analytical Processing OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Exegesis: critical explanation How did we get to where we are? 5

  6. Online Analytical Processing OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Asking what if? questions How does the outcome change if we vary the parameters? 6

  7. Online Analytical Processing OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Which parameters must be varied in order to achieve a given outcome? 7

  8. 12 Rules for OLAP 1. Multidimensional conceptual view 7. Dynamic sparse matrix handling 2. Transparency 8. Multi-user support 3. Accessibility 9. Unrestricted cross-dimensional operations 4. Consistent reporting performance 10. Intuitive data manipulation 5. Client-server architecture 11. Flexible reporting 6. Generic dimensionality 12. Unlimited dimensions and aggregation levels 8

  9. Data Mining Data mining is the process of discovering hidden patterns and relations in large databases using a variety of advanced analytical techniques Data mining attempts to use the computer to discover relationships that can be used to make predictions Data mining tools often find unsuspected relationships in data that other techniques will overlook 9

  10. Data Mining Approaches Rule-based analysis Neural networks Fuzzy Logic K-nearest-neighbour Genetic algorithms Advanced visualisation Combination of any of the above 10

  11. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making 11

  12. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The data is organised according to subject instead of application and contains only the information necessary for decision support processing. 12

  13. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making Data encoding is made uniform (e.g. sex = f or m, 1 or 2, b or g - needs to be all the same in the warehouse). Data naming is made consistent. 13

  14. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making Data is collected over time and can then be used for comparisons, trends and forecasting 14

  15. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The data is not updated or changed once in the data warehouse, but is simply loaded, and then accessed. The data warehouse is held quite separately from the operational database, which supports OLTP. 15

  16. Why a Separate Data Warehouse? Performance Operational databases are optimised to support known transactions and workloads Special data organisation, access methods and implementation methods are needed Complex OLAP queries would degrade performance for operational transactions 16

  17. Why a Separate Data Warehouse? Missing data Decision support requires historical data, which operational databases do not typically maintain Data consolidation Decision support requires consolidation (aggregation, summarisation) of data from many heterogeneous sources, including operational databases and external sources Data quality Different sources typically use inconsistent data representations, codes and formats, which have to be reconciled 17

  18. Extracting Data OPERATIONAL DATA WAREHOUSE Risk replace change Car Liability insert change Policy replace insert Claim Customers EIS DSS Analysis Etc - Data is cleansed - Data is restructured 18

  19. The Data Warehouse A Data Warehouse may be realised: via a front end to existing databases and files in a fresh relational database in a multidimensional database (MDDB) in a proprietary database format using a mixture of the above 19

  20. The Data Warehouse Data may be accessed in various ways: Decision Support Systems (DSS) Executive Information Systems (EIS) Data Mining On-Line Analytical Processing 20

  21. Data Marts A data mart focuses on only one subject area, or only one group of users An organisation can have one enterprise data warehouse many data marts Data marts do not contain operational data Data marts are more easily understood and navigated 21

  22. Multidimensional Analysis Need to examine data in various ways Produce views of multidimensional data for users: Slice Dice Pivot Drill down Roll up 22

  23. Multidimensional Analysis Slice Operator Train Performance - 3 dimensions - Operators - Performance - Time One operator's performance over time Time Performance Overall performance for one criterion over time Overall performance on a particular day 23

  24. Multidimensional Analysis Dice W S N 10 Juice Cola Milk Cream Toothpaste Soap Product 50 20 12 15 10 1 2 3 4 5 6 Month 24

  25. Multidimensional Analysis Pivot W S N Juice 10 50 Cola Product Milk 20 Cream 12 Toothpaste 15 Soap 10 1 2 3 4 5 6 Month Region Product 25

  26. Multidimensional Analysis Drill Down W S N Juice 10 50 Cola by brand Product Milk 20 Cream 12 Toothpaste 15 Soap 10 1 2 3 4 5 6 Month 26

  27. Multidimensional Analysis Roll Up W S N 80 Beverages Product Toiletries 37 1 2 3 4 5 6 Month 27

  28. Internal Aspects Schemas Star schema Snowflake schema Fact constellation schema Aggregated data Specialised indexes Bit map indexes (see lecture on multidimensional indexes) Join indexes Specialised join methods 28

  29. Star Schema Time Geography Code Region Code Region Manager City Code City Name Post Code Time Code Quarter Code Quarter Name Date Month Code Month Name Day Code Day of Week Season Sales Geography Code Time Code Account Code Product Code Sterling Amount Units Product Product Code Product Name Brand Manager Brand Name Prod Line Code Prod Line Name Prod Line Mgr Product Name Product Colour Product Model No Account Account Code Key Account Code Key Account Name Account Name Account Type Account Market 29

  30. Fact Tables Prod_Code Time_Code Acct_Code Sales Qty 101 2045 501 100 1 102 2045 501 225 2 103 2046 501 200 20 104 2046 502 250 25 105 2046 502 20 1 numerical measures key columns joining fact table to the dimension tables 30

  31. Part of a Snowflake Schema Brand Product Line Product Desc Brand Code Brand Manager Brand Name Prod Line Code Prod Line Name Prod Line Mgr Product Name Product Colour Product Model No Quarter Product Quarter Code Quarter Name Product Code Prod Line Code Brand Code Month Time Sales Month Code Month Name Time Code Year Code Quarter Code Month Code Day Code Geography Code Time Code Account Code Product Code Sterling Amount Units Day Day Code Day of Week Season 31

  32. Data Warehouse Databases Relational and Specialised RDBMSs Specialised indexing techniques, join and scan methods Relational OLAP (ROLAP) servers Explicitly developed to use a relational engine to support OLAP Include aggregation navigation logic, the ability to generate multi- statement SQL, and other additional services Multidimensional OLAP (MOLAP) servers The storage model is an n-dimensional array May use a 2-level approach, with 2-D dense arrays indexed by B-Trees Time is often one of the dimensions 32

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#