Reference Data Management in Enterprises

undefined
Mehmet Orun, MBA, CDMP
Reference Data Management is critical to success of
Master Data Management, Data Warehousing, or
SOA integration.
Traditional RDM efforts by data architects are limited
to Data Domain values in data models and mapping
tables for integration (ETL, ESB…) developers
This presentation provides an enterprise-wide view of
what reference data is, why it matters, and how it can
be managed, to increase the practical knowledge of
the data professional.
Mehmet Orun is the Sr. Manager for
Data Architecture & Analysis as well as
solution owner for Master Data
Management at Genentech, where he
is responsible for data services rollout
and associated processes, technology
roadmaps and methodologies.
Mehmet’s interest and expertise
includes data and meta data driven
solutions to bridge the gap across
applications, processes, and structured
and unstructured data sources. He has
master level CDMP certification in
Data Management and Data &
Information Quality and is a Senior
Member of the ACM.
Mehmet serves on the board of San
Francisco DAMA Chapter
Mehmet Orun is also the primary
contributor to DAMA-DMBOK’s
“Reference & Master Data Mngt”
chapter.  Some of the frameworks in
this presentation are based on this
content.
What is Reference Data
A Definition
Some Examples
The complexity
Reference Data in the Enterprise
RDM Function
Conceptual Framework
Reference Data Types
Use/Design Patterns & Implications
RDM Functions (from DMBOK)
Reference data is any kind of data that is used solely to categorize
other data found in a database, or solely for relating data in a
database to information beyond the boundaries of the enterprise.
(Malcolm Chisholm, TDAN, 10/1/2001)
Wikipedia suggests:
Reference data 
are data describing a physical or virtual object and its
properties, usually described with nouns.  A special type of reference
data is 
master reference data - these are reference data shared over a
number of systems.
The above mixes Reference Data Management, MDM and Data Standards
(noted as many refer to Wikipedia as the authoritative source)
State Codes
CA, MO, AL, …
Hair Color:
Brown, Blonde, …
Diseases
Cancer, Ulcer, …
[Postal] State Codes [-USA]:
What does a code mean?
CA: Canada or California?
What is an authoritative source?
 
USPS 
for State codes?
ISO 3166 
for country codes and if so 
ISO 3166-2
for state codes?
 
What is an acceptable list of values?
Hair color: Black, Brown, Blond, Red
What about Chestnut?
What about Pink, or Purple?
What about Pink 
and
 Purple?
What is the 
right
 level of detail “
to categorize
other data”?
Cancer
Breast Cancer
174.9, Female breast, unspecified (Ref: ICD9)
What do we mean by a term anyway?
Cancer?
Cancer cell photo from alternative-cancer.net
 
Crab family taxonomy from NCBI
 
What are there reference
data elements here?
Trip Type
Airport Code & Names
Traveler Type
Class of Service
Fare Type
Flags?
Other?
If you don’t know what something means, how do you
get meaning out of it?
If data is in different levels of detail from different
sources, how do you maintain richness without
sacrificing accuracy?
How do you relate one set of categories to another,
without standards?
The rest of this presentation will describe different types
of Reference Data and their use of Reference Data in
the enterprise
undefined
Enterprise Context and Reference Data Types
Reference Data Management is control over defined domain
values (also known as vocabularies), including control over
standardized terms, code values, and other unique identifiers,
business definitions for each value, business relationships within
and across domain value lists, and the consistent, shared use of
accurate, timely, and relevant reference data values to classify
and categorize data.
(DAMA DMBOK, Chapter 8)
If you do not [somehow] manage your reference data, broader
data integration and master data reconciliation will be
challenging at best
ANSI/NISO Z39.19-2005 is the authoritative standard and
guideline for reference data/vocabulary management
What are there reference data
elements here?
San Francisco the city vs. a
pointer to a metropolitan area
SFO or San Francisco as airport
indicators
Hotel names with many
synonyms
Hierarchical
P
a
r
e
n
t
 
/
 
C
h
i
l
d
R
e
l
a
t
i
o
n
s
h
i
p
 
b
e
t
w
e
e
n
p
a
r
e
n
t
 
a
n
d
 
a
 
c
h
i
l
d
 
c
a
n
 
b
e
r
e
l
a
t
i
v
e
l
y
 
u
n
d
e
r
s
p
e
c
i
f
i
e
d
 
o
r
i
l
l
 
d
e
f
i
n
e
d
.
 
Equivalence - Synonym
Homographic-Spelled the same
different meaning – “Tank”
Hierarchical-Broader than/Narrower
Than
Associated-Associated
A thesaurus is typically used to
associate the rough meaning of a
term to the rough meaning of another
term.
Flat List
Taxonomy
List
Thesaurus
Increasing complexity
Simple
to
Complex
Ontology
Classes (general things) many
domains of interest
Instances (particular things)
Relationships among those things
Properties (values) of those things
Functions of and processes
involving those things
Constraints on and rules involving
those things.
Simple Value Lists
Defined Value Lists
Taxonomic Reference Data
Ontological Reference Data
Relationship Types
Computer
Manufacturers
International
Business
Machines
IBM
Software
Group
Big Blue
Hardware
Software
Equivalent Term Relationship
:
A relationship between or among terms in a controlled vocabulary that
leads to one or more terms that are to be used instead of the term from
which the cross-reference is made.  (Z39.19) The relationship indicator
for this type is ET.
Hierarchical Relationship
:
A relationship between or among terms in a controlled vocabulary that
depicts broader (generic) to narrower (specific) or whole-part
relationships; begins with the words broader term (BT), or narrower term
(NT).  (Z39.19) The relationship indicator for this type is HT.
Related Term Relationship
:
 A term that is associatively but not hierarchically linked to another term
in a controlled vocabulary. In thesauri, the relationship indicator for this
type of term is RT. (Z39.19)
undefined
Implementation Patterns & Data Management Functions
 
If selecting a single value stores it in a database
row/cell, what happens when you select multiple
values?
How do you report on any one value?
a.
Select … where <field> = <value>
b.
Select … where <field> IN {value}
c.
Select … where <field> like ‘%value%’
d.
Other?
Who is an AV buf in the audience?
 
Who knows their doctor well?
Reference Data Management is control over defined domain
values (also known as vocabularies)
Vocabulary Management
function of defining, sourcing, importing, and maintaining, … any
given vocabulary
Key questions:
What information concepts will this vocabulary support? (which data
attributes)
Who is the audience?
Why is the vocabulary needed? (app, BI , content management?)
Who are the decision makers?
What are the current vocabularies?
Are there existing standards?
Term: One or more words designating a concept
Term Management
Specifying how Terms are initially defined and classified and
how this information is maintained once it starts
Most significant effort is to identify terms that will
constitute the standard
Preferred Term
: One of two or more synonyms selected as a
term for inclusion in a controlled vocabulary
Term Relationship Management
How the terms relate
Hierarchical Relationship
 depicts broader (generic) to
narrower (specific) or whole-part relationships
In data integration, if terms are inconsistent, map to broader
relationship for accuracy (data detail loss will occur)
In BI, consider allowing mixed granularity
Related Term Relationship
  depitcs associatively without
hierarchical relationships
Term Change
Due to deprecation or its metadata changing
If definition is improved, no impact
If value/data type… changes, assess impact
Term Relationship Change
If relationships are changing, integration or aggregation rules may be
impacted
Vocabulary Version Change
Important especially when external standards are used
Minor (addition/deprecation) vs. Major revisions
Reference Data/Vocabularies/Business Semantics are
pervasive in the enterprise.
Broad implementations would likely take time, cost
money, and not clearly understood
Like many data management initiatives, pick an area
with business value that can drive focus, develop a
sustainable solution and perhaps obtain tools to help.
Process and accountability is more important than
technology
Track value, share the success, and expand your scope
 
Slide Note
Embed
Share

Reference data plays a crucial role in the success of data management, warehousing, and integration initiatives. This presentation by Mehmet Orun sheds light on the significance of reference data, its management, and practical implications for data professionals. Discover the definition, examples, types, and functions of reference data within the enterprise context.

  • Reference Data
  • Data Management
  • Data Warehousing
  • Data Integration
  • Enterprise

Uploaded on Mar 08, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Mehmet Orun, MBA, CDMP

  2. Reference Data Management is critical to success of Master Data Management, Data Warehousing, or SOA integration. Traditional RDM efforts by data architects are limited to Data Domain values in data models and mapping tables for integration (ETL, ESB ) developers This presentation provides an enterprise-wide view of what reference data is, why it matters, and how it can be managed, to increase the practical knowledge of the data professional.

  3. Mehmet Orun is the Sr. Manager for Data Architecture & Analysis as well as solution owner for Master Data Management at Genentech, where he is responsible for data services rollout and associated processes, technology roadmaps and methodologies. Mehmet Orun is also the primary contributor to DAMA-DMBOK s Reference & Master Data Mngt chapter. Some of the frameworks in this presentation are based on this content. Mehmet s interest and expertise includes data and meta data driven solutions to bridge the gap across applications, processes, and structured and unstructured data sources. He has master level CDMP certification in Data Management and Data & Information Quality and is a Senior Member of the ACM. Mehmet serves on the board of San Francisco DAMA Chapter

  4. What is Reference Data A Definition Some Examples The complexity Reference Data in the Enterprise RDM Function Conceptual Framework Reference Data Types Use/Design Patterns & Implications RDM Functions (from DMBOK)

  5. Reference data is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise. (Malcolm Chisholm, TDAN, 10/1/2001) Wikipedia suggests: Reference data are data describing a physical or virtual object and its properties, usually described with nouns. A special type of reference data is master reference data - these are reference data shared over a number of systems. The above mixes Reference Data Management, MDM and Data Standards (noted as many refer to Wikipedia as the authoritative source)

  6. State Codes CA, MO, AL, Hair Color: Brown, Blonde, Diseases Cancer, Ulcer, [Postal] State Codes [-USA]:

  7. What does a code mean? CA: Canada or California? What is an authoritative source? USPS for State codes? ISO 3166 for country codes and if so ISO 3166-2 for state codes?

  8. What is an acceptable list of values? Hair color: Black, Brown, Blond, Red What about Chestnut? What about Pink, or Purple? What about Pink and Purple?

  9. What is the rightlevel of detail to categorize other data ? Cancer Breast Cancer 174.9, Female breast, unspecified (Ref: ICD9)

  10. What do we mean by a term anyway? Cancer? Crab family taxonomy from NCBI Cancer cell photo from alternative-cancer.net

  11. What are there reference data elements here? Trip Type Airport Code & Names Traveler Type Class of Service Fare Type Flags? Other?

  12. If you dont know what something means, how do you get meaning out of it? If data is in different levels of detail from different sources, how do you maintain richness without sacrificing accuracy? How do you relate one set of categories to another, without standards? The rest of this presentation will describe different types of Reference Data and their use of Reference Data in the enterprise

  13. Enterprise Context and Reference Data Types

  14. Reference Data Management is control over defined domain values (also known as vocabularies), including control over standardized terms, code values, and other unique identifiers, business definitions for each value, business relationships within and across domain value lists, and the consistent, shared use of accurate, timely, and relevant reference data values to classify and categorize data. (DAMA DMBOK, Chapter 8) If you do not [somehow] manage your reference data, broader data integration and master data reconciliation will be challenging at best ANSI/NISO Z39.19-2005 is the authoritative standard and guideline for reference data/vocabulary management

  15. Information Integration Data Governance & Quality Master Data Management Business Intelligence Knowledge Management Code mapping rules Single source for ESB, ETL, EII Manage terms, names, definitions Code lists for quality rules Use capability for Information Integration, Data Governance, and Quality Hierarchies for drill down or data mining analytics Taxonomy or Ontology for Search and Navigation Content Tagging Reference Data Management Solution (Vocabulary, Vocabulary View, Term-Relationship Management, Publication and Translation Services) Import or link attributes and definitions as part of vocabulary meta data Code lists/vocabularies and term relationships Data Models Code List and Vocabularies

  16. What are there reference data elements here? San Francisco the city vs. a pointer to a metropolitan area SFO or San Francisco as airport indicators Hotel names with many synonyms

  17. Simple to Complex Increasing complexity Taxonomy Thesaurus Ontology List Classes (general things) many domains of interest Equivalence - Synonym Homographic-Spelled the same different meaning Tank Hierarchical-Broader than/Narrower Than Associated-Associated A thesaurus is typically used to associate the rough meaning of a term to the rough meaning of another term. Hierarchical Parent / Child Relationship between parent and a child can be relatively underspecified or ill defined. Flat List Instances (particular things) Relationships among those things Properties (values) of those things Functions of and processes involving those things Constraints on and rules involving those things.

  18. Simple Value Lists Defined Value Lists

  19. Taxonomic Reference Data Ontological Reference Data

  20. Relationship Types Computer Manufacturers E Equivalence H Hierarchical H A Associative International Business Machines E E IBM Big Blue ? H ? Software Group Hardware Software ? A A

  21. Equivalent Term Relationship: A relationship between or among terms in a controlled vocabulary that leads to one or more terms that are to be used instead of the term from which the cross-reference is made. (Z39.19) The relationship indicator for this type is ET. Hierarchical Relationship: A relationship between or among terms in a controlled vocabulary that depicts broader (generic) to narrower (specific) or whole-part relationships; begins with the words broader term (BT), or narrower term (NT). (Z39.19) The relationship indicator for this type is HT. Related Term Relationship: A term that is associatively but not hierarchically linked to another term in a controlled vocabulary. In thesauri, the relationship indicator for this type of term is RT. (Z39.19)

  22. Information Integration Data Governance & Quality Master Data Management Business Intelligence Knowledge Management Code mapping rules Single source for ESB, ETL, EII Manage terms, names, definitions Code lists for quality rules Use capability for Information Integration, Data Governance, and Quality Hierarchies for drill down or data mining analytics Taxonomy or Ontology for Search and Navigation Content Tagging Reference Data Management Solution (Vocabulary, Vocabulary View, Term-Relationship Management, Publication and Translation Services) Import or link attributes and definitions as part of vocabulary meta data Code lists/vocabularies and term relationships Data Models Code List and Vocabularies

  23. Implementation Patterns & Data Management Functions

  24. If selecting a single value stores it in a database row/cell, what happens when you select multiple values? How do you report on any one value? a. Select where <field> = <value> b. Select where <field> IN {value} c. Select where <field> like %value% d. Other?

  25. Who is an AV buf in the audience? Who knows their doctor well?

  26. Reference Data Management is control over defined domain values (also known as vocabularies) Vocabulary Management function of defining, sourcing, importing, and maintaining, any given vocabulary Key questions: What information concepts will this vocabulary support? (which data attributes) Who is the audience? Why is the vocabulary needed? (app, BI , content management?) Who are the decision makers? What are the current vocabularies? Are there existing standards?

  27. Term: One or more words designating a concept Term Management Specifying how Terms are initially defined and classified and how this information is maintained once it starts Most significant effort is to identify terms that will constitute the standard Preferred Term: One of two or more synonyms selected as a term for inclusion in a controlled vocabulary

  28. Term Relationship Management How the terms relate Hierarchical Relationship depicts broader (generic) to narrower (specific) or whole-part relationships In data integration, if terms are inconsistent, map to broader relationship for accuracy (data detail loss will occur) In BI, consider allowing mixed granularity Related Term Relationship depitcs associatively without hierarchical relationships

  29. Term Change Due to deprecation or its metadata changing If definition is improved, no impact If value/data type changes, assess impact Term Relationship Change If relationships are changing, integration or aggregation rules may be impacted Vocabulary Version Change Important especially when external standards are used Minor (addition/deprecation) vs. Major revisions

  30. Reference Data/Vocabularies/Business Semantics are pervasive in the enterprise. Broad implementations would likely take time, cost money, and not clearly understood Like many data management initiatives, pick an area with business value that can drive focus, develop a sustainable solution and perhaps obtain tools to help. Process and accountability is more important than technology Track value, share the success, and expand your scope

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#