Reference Data Management in Enterprises
Reference data plays a crucial role in the success of data management, warehousing, and integration initiatives. This presentation by Mehmet Orun sheds light on the significance of reference data, its management, and practical implications for data professionals. Discover the definition, examples, types, and functions of reference data within the enterprise context.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Reference Data Management is critical to success of Master Data Management, Data Warehousing, or SOA integration. Traditional RDM efforts by data architects are limited to Data Domain values in data models and mapping tables for integration (ETL, ESB ) developers This presentation provides an enterprise-wide view of what reference data is, why it matters, and how it can be managed, to increase the practical knowledge of the data professional.
Mehmet Orun is the Sr. Manager for Data Architecture & Analysis as well as solution owner for Master Data Management at Genentech, where he is responsible for data services rollout and associated processes, technology roadmaps and methodologies. Mehmet Orun is also the primary contributor to DAMA-DMBOK s Reference & Master Data Mngt chapter. Some of the frameworks in this presentation are based on this content. Mehmet s interest and expertise includes data and meta data driven solutions to bridge the gap across applications, processes, and structured and unstructured data sources. He has master level CDMP certification in Data Management and Data & Information Quality and is a Senior Member of the ACM. Mehmet serves on the board of San Francisco DAMA Chapter
What is Reference Data A Definition Some Examples The complexity Reference Data in the Enterprise RDM Function Conceptual Framework Reference Data Types Use/Design Patterns & Implications RDM Functions (from DMBOK)
Reference data is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise. (Malcolm Chisholm, TDAN, 10/1/2001) Wikipedia suggests: Reference data are data describing a physical or virtual object and its properties, usually described with nouns. A special type of reference data is master reference data - these are reference data shared over a number of systems. The above mixes Reference Data Management, MDM and Data Standards (noted as many refer to Wikipedia as the authoritative source)
State Codes CA, MO, AL, Hair Color: Brown, Blonde, Diseases Cancer, Ulcer, [Postal] State Codes [-USA]:
What does a code mean? CA: Canada or California? What is an authoritative source? USPS for State codes? ISO 3166 for country codes and if so ISO 3166-2 for state codes?
What is an acceptable list of values? Hair color: Black, Brown, Blond, Red What about Chestnut? What about Pink, or Purple? What about Pink and Purple?
What is the rightlevel of detail to categorize other data ? Cancer Breast Cancer 174.9, Female breast, unspecified (Ref: ICD9)
What do we mean by a term anyway? Cancer? Crab family taxonomy from NCBI Cancer cell photo from alternative-cancer.net
What are there reference data elements here? Trip Type Airport Code & Names Traveler Type Class of Service Fare Type Flags? Other?
If you dont know what something means, how do you get meaning out of it? If data is in different levels of detail from different sources, how do you maintain richness without sacrificing accuracy? How do you relate one set of categories to another, without standards? The rest of this presentation will describe different types of Reference Data and their use of Reference Data in the enterprise
Reference Data Management is control over defined domain values (also known as vocabularies), including control over standardized terms, code values, and other unique identifiers, business definitions for each value, business relationships within and across domain value lists, and the consistent, shared use of accurate, timely, and relevant reference data values to classify and categorize data. (DAMA DMBOK, Chapter 8) If you do not [somehow] manage your reference data, broader data integration and master data reconciliation will be challenging at best ANSI/NISO Z39.19-2005 is the authoritative standard and guideline for reference data/vocabulary management
Information Integration Data Governance & Quality Master Data Management Business Intelligence Knowledge Management Code mapping rules Single source for ESB, ETL, EII Manage terms, names, definitions Code lists for quality rules Use capability for Information Integration, Data Governance, and Quality Hierarchies for drill down or data mining analytics Taxonomy or Ontology for Search and Navigation Content Tagging Reference Data Management Solution (Vocabulary, Vocabulary View, Term-Relationship Management, Publication and Translation Services) Import or link attributes and definitions as part of vocabulary meta data Code lists/vocabularies and term relationships Data Models Code List and Vocabularies
What are there reference data elements here? San Francisco the city vs. a pointer to a metropolitan area SFO or San Francisco as airport indicators Hotel names with many synonyms
Simple to Complex Increasing complexity Taxonomy Thesaurus Ontology List Classes (general things) many domains of interest Equivalence - Synonym Homographic-Spelled the same different meaning Tank Hierarchical-Broader than/Narrower Than Associated-Associated A thesaurus is typically used to associate the rough meaning of a term to the rough meaning of another term. Hierarchical Parent / Child Relationship between parent and a child can be relatively underspecified or ill defined. Flat List Instances (particular things) Relationships among those things Properties (values) of those things Functions of and processes involving those things Constraints on and rules involving those things.
Simple Value Lists Defined Value Lists
Taxonomic Reference Data Ontological Reference Data
Relationship Types Computer Manufacturers E Equivalence H Hierarchical H A Associative International Business Machines E E IBM Big Blue ? H ? Software Group Hardware Software ? A A
Equivalent Term Relationship: A relationship between or among terms in a controlled vocabulary that leads to one or more terms that are to be used instead of the term from which the cross-reference is made. (Z39.19) The relationship indicator for this type is ET. Hierarchical Relationship: A relationship between or among terms in a controlled vocabulary that depicts broader (generic) to narrower (specific) or whole-part relationships; begins with the words broader term (BT), or narrower term (NT). (Z39.19) The relationship indicator for this type is HT. Related Term Relationship: A term that is associatively but not hierarchically linked to another term in a controlled vocabulary. In thesauri, the relationship indicator for this type of term is RT. (Z39.19)
Information Integration Data Governance & Quality Master Data Management Business Intelligence Knowledge Management Code mapping rules Single source for ESB, ETL, EII Manage terms, names, definitions Code lists for quality rules Use capability for Information Integration, Data Governance, and Quality Hierarchies for drill down or data mining analytics Taxonomy or Ontology for Search and Navigation Content Tagging Reference Data Management Solution (Vocabulary, Vocabulary View, Term-Relationship Management, Publication and Translation Services) Import or link attributes and definitions as part of vocabulary meta data Code lists/vocabularies and term relationships Data Models Code List and Vocabularies
If selecting a single value stores it in a database row/cell, what happens when you select multiple values? How do you report on any one value? a. Select where <field> = <value> b. Select where <field> IN {value} c. Select where <field> like %value% d. Other?
Who is an AV buf in the audience? Who knows their doctor well?
Reference Data Management is control over defined domain values (also known as vocabularies) Vocabulary Management function of defining, sourcing, importing, and maintaining, any given vocabulary Key questions: What information concepts will this vocabulary support? (which data attributes) Who is the audience? Why is the vocabulary needed? (app, BI , content management?) Who are the decision makers? What are the current vocabularies? Are there existing standards?
Term: One or more words designating a concept Term Management Specifying how Terms are initially defined and classified and how this information is maintained once it starts Most significant effort is to identify terms that will constitute the standard Preferred Term: One of two or more synonyms selected as a term for inclusion in a controlled vocabulary
Term Relationship Management How the terms relate Hierarchical Relationship depicts broader (generic) to narrower (specific) or whole-part relationships In data integration, if terms are inconsistent, map to broader relationship for accuracy (data detail loss will occur) In BI, consider allowing mixed granularity Related Term Relationship depitcs associatively without hierarchical relationships
Term Change Due to deprecation or its metadata changing If definition is improved, no impact If value/data type changes, assess impact Term Relationship Change If relationships are changing, integration or aggregation rules may be impacted Vocabulary Version Change Important especially when external standards are used Minor (addition/deprecation) vs. Major revisions
Reference Data/Vocabularies/Business Semantics are pervasive in the enterprise. Broad implementations would likely take time, cost money, and not clearly understood Like many data management initiatives, pick an area with business value that can drive focus, develop a sustainable solution and perhaps obtain tools to help. Process and accountability is more important than technology Track value, share the success, and expand your scope