Advanced Databases and Online Analytical Processing

undefined
 
Data Warehousing
 
COMP3211 Advanced Databases
 
Dr Nicholas Gibbins – nmg@ecs.soton.ac.uk
2020-2021
 
Processing Styles – OLTP
 
3
 
On-Line Transaction Processing
Traditional workloads,
bread and butter
 processing
Volumes of data, transactions grow, networks getting larger.
 
Processing Styles – OLAP
 
4
 
On-Line Analytical Processing
includes the use of data warehouses
multidimensional databases
data analysis
 
Online Analytical Processing
Edgar Codd
OLAP is the name given to the dynamic
enterprise analysis required to create,
manipulate, animate and synthesise
information from exegetical,
contemplative and formulaic data
analysis models
 
Online Analytical Processing
 
Edgar Codd
 
Exegesis: critical explanation
 
How did we get to where we are?
OLAP is the name given to the dynamic
enterprise analysis required to create,
manipulate, animate and synthesise
information from 
exegetical
,
contemplative and formulaic data
analysis models
 
Online Analytical Processing
 
Edgar Codd
 
Asking ‘what if?’ questions
 
How does the outcome change if
we vary the parameters?
OLAP is the name given to the dynamic
enterprise analysis required to create,
manipulate, animate and synthesise
information from exegetical,
contemplative
 and formulaic data
analysis models
 
Online Analytical Processing
 
Edgar Codd
 
Which parameters must be varied
in order to achieve a given outcome?
OLAP is the name given to the dynamic
enterprise analysis required to create,
manipulate, animate and synthesise
information from exegetical,
contemplative and 
formulaic
 data
analysis models
 
12 Rules for OLAP
 
1.
Multidimensional conceptual view
2.
Transparency
3.
Accessibility
4.
Consistent reporting performance
5.
Client-server architecture
6.
Generic dimensionality
 
7.
Dynamic sparse matrix handling
8.
Multi-user support
9.
Unrestricted cross-dimensional
operations
10.
Intuitive data manipulation
11.
Flexible reporting
12.
Unlimited dimensions and aggregation
levels
 
Data Mining
 
10
 
Data mining 
is the process of discovering hidden patterns and relations in large
databases using a variety of advanced analytical techniques
Data mining attempts to use the computer to discover relationships that can be used
to make predictions
Data mining tools often find unsuspected relationships in data that other techniques
will overlook
Variety of approaches: rule-based, fuzzy logic, machine learning, visualisations
 
The Data Warehouse
 
11
 
A 
data warehouse 
is a subject-oriented, integrated,
time-variant, non-volatile collection of data that is used primarily in organisational
decision making
 
The Data Warehouse
 
A 
data warehouse 
is a 
subject-oriented
, integrated,
time-variant, non-volatile collection of data that is used primarily in organisational
decision making
 
The data is organised according to subject instead of application and contains only the
information necessary for ‘decision support
processing.
 
12
 
The Data Warehouse
 
A 
data warehouse 
is a subject-oriented, 
integrated
,
time-variant, non-volatile collection of data that is used primarily in organisational
decision making
 
Data encoding is made uniform
(e.g. sex = f or m, 1 or 2, b or g - needs to be all the same in the warehouse).
 
Data naming is made consistent.
 
13
 
The Data Warehouse
 
A 
data warehouse 
is a subject-oriented, integrated,
time-variant
, non-volatile collection of data that is used primarily in organisational
decision making
 
Data is collected over time and can then be used for comparisons, trends and
forecasting
 
14
 
The Data Warehouse
 
A 
data warehouse 
is a subject-oriented, integrated,
time-variant, 
non-volatile 
collection of data that is used primarily in organisational
decision making
 
The data is not updated or changed once in the data warehouse, but is simply loaded,
and then accessed.
 
The data warehouse is held quite separately from the operational database, which
supports OLTP.
 
15
 
Why a Separate Data Warehouse?
 
16
 
Performance
Operational databases are optimised to support known transactions and workloads
Special data organisation, access methods and implementation methods are needed
Complex OLAP queries would degrade performance for operational transactions
 
Why a Separate Data Warehouse?
 
17
 
Missing data
Decision support requires historical data, which operational databases do not typically
maintain
Data consolidation
Decision support requires consolidation (aggregation, summarisation) of data from many
heterogeneous sources, including operational databases and external sources
Data quality
Different sources typically use inconsistent data representations, codes and formats, which
have to be reconciled
 
Extracting Data
 
18
 
Car
 
Policy
 
Claim
 
OPERATIONAL
 
DATA WAREHOUSE
 
Customers
 
replace
 
change
 
insert
 
change
 
replace
 
insert
 
Liability
 
Risk
 
EIS
 
DSS
 
Analysis
 
Etc
 
- Data is cleansed
- Data is restructured
 
The Data Warehouse
 
19
 
A Data Warehouse may be realised:
via a front end to existing databases and files
in a fresh relational database
in a multidimensional database (MDDB)
in a proprietary database format
using a mixture of the above
 
The Data Warehouse
 
20
 
Data may be accessed in various ways:
Decision Support Systems (DSS)
Executive Information Systems (EIS)
Data Mining
On-Line Analytical Processing
 
Data Marts
 
21
 
A data mart focuses on
only one subject area, or
only one group of users
An organisation can have
one enterprise data warehouse
many data marts
Data marts do not contain operational data
Data marts are more easily understood and navigated
 
Multidimensional Analysis
 
22
 
Need to examine data in various ways
 
Produce views of multidimensional data for users:
Slice
Dice
Pivot
Drill down
Roll up
 
Multidimensional Analysis – Slice
 
23
 
Operator
 
Performance
 
Time
 
One operator's
 
performance
 
over time
 
Overall performance
on a particular day
 
Overall performance
for one criterion
over time
 
Train Performance
 -  3 dimensions
 
  - Operators
  - Performance
  - Time
 
Multidimensional Analysis – Dice
 
24
 
10
 
50
 
20
 
12
 
15
 
10
 
Juice
 
Cola
 
Milk
 
Cream
 
Toothpaste
 
Soap
 
1   2   3   4   5   6
 
Month
 
N
 
W
 
S
 
Region
 
Product
 
Multidimensional Analysis – Pivot
 
25
 
10
 
50
 
20
 
12
 
15
 
10
 
Juice
 
Cola
 
Milk
 
Cream
 
Toothpaste
 
Soap
 
1    2    3    4    5    6
 
Month
 
N
 
W
 
S
 
Region
 
Product
 
Product
 
Month
 
Region
 
Multidimensional Analysis – Drill Down
 
26
 
10
 
50
 
20
 
12
 
15
 
10
 
Juice
 
Cola
 
Milk
 
Cream
 
Toothpaste
 
Soap
 
1    2    3    4    5    6
 
Month
 
N
 
W
 
S
 
Region
 
Product
by brand
 
Multidimensional Analysis – Roll Up
 
27
 
80
 
37
 
Beverages
 
Toiletries
 
1    2    3    4    5    6
 
Month
 
N
 
W
 
S
 
Region
 
Product
 
Internal Aspects
 
28
 
Schemas
Star schema
Snowflake schema
Fact constellation schema
Aggregated data
Specialised indexes
Bit map indexes (see lecture on multidimensional indexes)
Join indexes
Specialised join methods
 
Star Schema
 
29
 
Time Code
Quarter Code
Quarter Name
Date
Month Code
Month Name
Day Code
Day of Week
Season
 
Geography Code
Region Code
Region Manager
City Code
City Name
Post Code
 
Account Code
Key Account Code
Key Account Name
Account Name
Account Type
Account Market
 
Product Code
Product Name
Brand Manager
Brand Name
Prod Line Code
Prod Line Name
Prod Line Mgr
Product Name
Product Colour
Product Model No
 
Geography Code
Time Code
Account Code
Product Code
Sterling Amount
Units
 
Time
 
Sales
 
Account
 
Product
 
Fact Tables
 
30
 
key columns joining fact table
to the dimension tables
 
numerical
measures
 
Part of a Snowflake Schema
 
31
 
Time Code
Year Code
Quarter Code
Month Code
Day Code
 
Product Code
Prod Line Code
Brand Code
 
Geography Code
Time Code
Account Code
Product Code
Sterling Amount
Units
 
Time
 
Sales
 
Product
 
Quarter Code
Quarter Name
 
Quarter
 
Month Code
Month Name
 
Month
 
Day Code
Day of Week
Season
 
Day
 
Product Name
Product Colour
Product Model No
 
Product Desc
 
Brand Code
Brand Manager
Brand Name
 
Brand
 
Prod Line Code
Prod Line Name
Prod Line Mgr
 
Product Line
 
Data Warehouse Databases
 
32
 
Relational and Specialised RDBMSs
Specialised indexing techniques, join and scan methods
Relational OLAP (ROLAP) servers
Explicitly developed to use a relational engine to support OLAP
Include aggregation navigation logic, the ability to generate multi-statement SQL, and other
additional services
Multidimensional OLAP (MOLAP) servers
The storage model is an n-dimensional array
May use a 2-level approach, with 2-D dense arrays indexed by B-Trees
Time is often one of the dimensions
 
Next Lecture: Information Retrieval
Slide Note
Embed
Share

Explore the world of data warehousing, advanced databases, OLTP, and OLAP with Dr. Nicholas Gibbins. Learn about traditional processing styles, multidimensional databases, and the dynamic analysis techniques of Online Analytical Processing (OLAP). Understand the importance of asking "what if" questions and mastering the 12 rules for OLAP for effective data manipulation and decision-making.

  • Data Warehousing
  • Advanced Databases
  • OLTP
  • OLAP
  • Online Analytical Processing

Uploaded on Sep 16, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Data Warehousing COMP3211 Advanced Databases Dr Nicholas Gibbins nmg@ecs.soton.ac.uk 2020-2021

  2. Processing Styles OLTP On-Line Transaction Processing Traditional workloads, bread and butter processing Volumes of data, transactions grow, networks getting larger. 3 3

  3. Processing Styles OLAP On-Line Analytical Processing includes the use of data warehouses multidimensional databases data analysis 4 4

  4. Online Analytical Processing OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Edgar Codd 5

  5. Online Analytical Processing OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Exegesis: critical explanation How did we get to where we are? Edgar Codd 6

  6. Online Analytical Processing OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Asking what if? questions How does the outcome change if we vary the parameters? Edgar Codd 7

  7. Online Analytical Processing OLAP is the name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exegetical, contemplative and formulaic data analysis models Which parameters must be varied in order to achieve a given outcome? Edgar Codd 8

  8. 12 Rules for OLAP 1. Multidimensional conceptual view 7. Dynamic sparse matrix handling 2. Transparency 8. Multi-user support 3. Accessibility 9. Unrestricted cross-dimensional operations 4. Consistent reporting performance 10. Intuitive data manipulation 5. Client-server architecture 11. Flexible reporting 6. Generic dimensionality 12. Unlimited dimensions and aggregation levels 9

  9. Data Mining Data mining is the process of discovering hidden patterns and relations in large databases using a variety of advanced analytical techniques Data mining attempts to use the computer to discover relationships that can be used to make predictions Data mining tools often find unsuspected relationships in data that other techniques will overlook Variety of approaches: rule-based, fuzzy logic, machine learning, visualisations 10 10

  10. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making 11 11

  11. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The data is organised according to subject instead of application and contains only the information necessary for decision support processing. 12 12

  12. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making Data encoding is made uniform (e.g. sex = f or m, 1 or 2, b or g - needs to be all the same in the warehouse). Data naming is made consistent. 13 13

  13. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making Data is collected over time and can then be used for comparisons, trends and forecasting 14 14

  14. The Data Warehouse A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data that is used primarily in organisational decision making The data is not updated or changed once in the data warehouse, but is simply loaded, and then accessed. The data warehouse is held quite separately from the operational database, which supports OLTP. 15 15

  15. Why a Separate Data Warehouse? Performance Operational databases are optimised to support known transactions and workloads Special data organisation, access methods and implementation methods are needed Complex OLAP queries would degrade performance for operational transactions 16 16

  16. Why a Separate Data Warehouse? Missing data Decision support requires historical data, which operational databases do not typically maintain Data consolidation Decision support requires consolidation (aggregation, summarisation) of data from many heterogeneous sources, including operational databases and external sources Data quality Different sources typically use inconsistent data representations, codes and formats, which have to be reconciled 17 17

  17. Extracting Data OPERATIONAL DATA WAREHOUSE Risk replace change Car Liability insert change Policy replace insert Claim Customers EIS DSS Analysis Etc - Data is cleansed - Data is restructured 18 18

  18. The Data Warehouse A Data Warehouse may be realised: via a front end to existing databases and files in a fresh relational database in a multidimensional database (MDDB) in a proprietary database format using a mixture of the above 19 19

  19. The Data Warehouse Data may be accessed in various ways: Decision Support Systems (DSS) Executive Information Systems (EIS) Data Mining On-Line Analytical Processing 20 20

  20. Data Marts A data mart focuses on only one subject area, or only one group of users An organisation can have one enterprise data warehouse many data marts Data marts do not contain operational data Data marts are more easily understood and navigated 21 21

  21. Multidimensional Analysis Need to examine data in various ways Produce views of multidimensional data for users: Slice Dice Pivot Drill down Roll up 22 22

  22. Multidimensional Analysis Slice Operator Train Performance - 3 dimensions - Operators - Performance - Time One operator's performance over time Time Performance Overall performance for one criterion over time Overall performance on a particular day 23 23

  23. Multidimensional Analysis Dice W S N 10 Juice Cola Milk Cream Toothpaste Soap 50 Product 20 12 15 10 1 2 3 4 5 6 Month 24 24

  24. Multidimensional Analysis Pivot W S N 10 Juice 50 Cola Product 20 Milk Cream 12 Toothpaste 15 Soap 10 1 2 3 4 5 6 Month Region Product 25 25

  25. Multidimensional Analysis Drill Down W S N 10 Juice 50 Cola by brand Product 20 Milk Cream 12 Toothpaste 15 Soap 10 1 2 3 4 5 6 Month 26 26

  26. Multidimensional Analysis Roll Up W S N 80 Beverages Product Toiletries 37 1 2 3 4 5 6 Month 27 27

  27. Internal Aspects Schemas Star schema Snowflake schema Fact constellation schema Aggregated data Specialised indexes Bit map indexes (see lecture on multidimensional indexes) Join indexes Specialised join methods 28 28

  28. Star Schema Time Geography Code Region Code Region Manager City Code City Name Post Code Time Code Quarter Code Quarter Name Date Month Code Month Name Day Code Day of Week Season Sales Geography Code Time Code Account Code Product Code Sterling Amount Units Product Product Code Product Name Brand Manager Brand Name Prod Line Code Prod Line Name Prod Line Mgr Product Name Product Colour Product Model No Account Account Code Key Account Code Key Account Name Account Name Account Type Account Market 29 29

  29. Fact Tables Prod_Code Time_Code Acct_Code Sales Qty 101 2045 501 100 1 102 2045 501 225 2 103 2046 501 200 20 104 2046 502 250 25 105 2046 502 20 1 numerical measures key columns joining fact table to the dimension tables 30 30

  30. Part of a Snowflake Schema Brand Product Line Product Desc Brand Code Brand Manager Brand Name Prod Line Code Prod Line Name Prod Line Mgr Product Name Product Colour Product Model No Quarter Product Quarter Code Quarter Name Product Code Prod Line Code Brand Code Month Time Sales Month Code Month Name Time Code Year Code Quarter Code Month Code Day Code Geography Code Time Code Account Code Product Code Sterling Amount Units Day Day Code Day of Week Season 31 31

  31. Data Warehouse Databases Relational and Specialised RDBMSs Specialised indexing techniques, join and scan methods Relational OLAP (ROLAP) servers Explicitly developed to use a relational engine to support OLAP Include aggregation navigation logic, the ability to generate multi-statement SQL, and other additional services Multidimensional OLAP (MOLAP) servers The storage model is an n-dimensional array May use a 2-level approach, with 2-D dense arrays indexed by B-Trees Time is often one of the dimensions 32 32

  32. Next Lecture: Information Retrieval

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#