Data Management Essentials

Data Management Essentials
Slide Note
Embed
Share

This content covers the fundamentals of data, information, knowledge, and the importance of database technology. It details the differences between data, information, and knowledge, the role of database management systems, characteristics of data warehouses, and the significance of data mining in organizations. Additionally, it addresses data quality issues such as redundancy, inconsistency, and integrity. The material also explains the concept of databases, including their organization, benefits, and types like flat and relational databases. Overall, it provides a foundational understanding of data and its management.

  • Data Management
  • Information
  • Knowledge
  • Database Technology
  • Data Quality

Uploaded on Mar 04, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CHAPTER 6 DATA AND KNOWLEDGE MANAGEMENT

  2. CHAPTER OBJECTIVES Describe the differences between data, information, and knowledge; Describe why database technology must be used for data resource management; Define the term database and identify the steps to creating one; Describe the role of a database management system; Describe the characteristics of a data warehouse; and Define data mining and describe its role in an organization

  3. DATA, INFORMATION, AND KNOWLEDGE Data: Raw Facts, devoid of context Information: processed data that possess context, relevance, and purpose (example: monthly sales calculated from the collected daily sales data for the past year) Knowledge: consumption of information produces knowledge. Tacit Knowledge: insights and intuitions Explicit Knowledge: can be expressed into words and numbers Once data is put into context, aggregated and analyzed, it can be used to make decisions

  4. DATA QUALITY Issues with data Redundant Inconsistent Inaccurate Corrupted

  5. Redundancy: Student 4567 is Mary Brown, and her major is Finance is stored more than once. Inconsistency: when the data system is huge, making changes to all redundant data is difficult if not impossible Integrity: Alex Wilson received a grade B in MKT211; however, you can t find Alex Wilson in the student roster Lack of integrity leads to corrupted data

  6. WHAT IS A DATABASE? It s an organized collection of data It s the most common way to store large amounts of data Users can store and retrieve the data easily Minimize redundancy Eliminate inconsistency and enforce data integrity

  7. FLAT VS RELATIONAL DATABASE Flat database Single collection of data Data not related to any other collections of data Spreadsheets Relational database Multiple collections of data May be related to each other

  8. HOW DATA IS ORGANIZED Bit Binary Digit -- 1 or 0 Byte -- 8 bits a single character field -- One single data item. record a set of related fields. Also called a row. Table a set of related records. Also called a relation. Database a set of related tables. Also called a relational database.

  9. EXAMPLE: CUSTOMER TABLE Fields (columns) Customer number 12 LastName FirstName Archer Jeffrey Records 25 Clarke Martha (rows) 30 Owen Hayes 47 Parker Stephen

  10. TERMINOLOGY A database management system (DBMS)is the software to manage a database. Access 2016/2019 is Microsoft s DBMS MS Access is for Pesonal Use Oracle, MS SQL Server, MySQL etc. for Industrial Use

  11. WHAT DO YOU PUT INTO A DATABASE? Data is stored for entities, their attributes, and the relationships between those entities.

  12. TERMINOLOGY: Entity a real or abstract object in the real world Examples -- customer, automobile, order Like a noun person, place or thing An entity becomes a table in the database.

  13. TERMINOLOGY Attribute characteristic of an entity Example: height, weight, name, address, salary, age, (fields of a person) Properties that are of interest to the particular application An attribute becomes a column in the table.

  14. TERMINOLOGY Relationship a connection or link between entities Example: A customer places orders

  15. THIS DIAGRAM SHOWS THE ENTITIES AND THEIR RELATIONSHIP Customer Order places

  16. ENTITY-RELATIONSHIP DIAGRAM (ERD) Customer Customer Number First Name Last Name Order Order Number Date places

  17. Customer Order places Order number 1 2 3 4 Customer number 25 30 12 12 Date 7/12/08 7/12/08 7/14/08 7/15/08 Customer number 12 25 30 47 LastName FirstName Archer Clarke Owen Parker Jeffrey Martha Hayes Stephen

  18. CUSTOMER NUMBER IS CALLED THE COMMON FIELD BECAUSE IT APPEARS IN BOTH TABLES. Customer Customer number 12 25 30 47 LastName FirstName Archer Clarke Owen Parker Jeffrey Martha Hayes Stephen Order Order number 1 2 3 4 Customer number 25 30 12 12 Date 7/12/14 7/12/14 7/14/14 7/15/14 The common field sets up the Martha Clarke placed Order 1

  19. QUERY: WHICH ORDER(S) DID JEFFREY ARCHER PLACE? Customer Customer number 12 25 30 47 LastName FirstName Archer Clarke Owen Parker Jeffrey Martha Hayes Stephen Order Order number 1 2 3 4 Customer number 25 30 12 12 Date 7/12/14 7/12/14 7/14/14 7/15/14 He placed both Order number 3 and 4.

  20. PRIMARY KEY AN IDENTIFIER, A FIELD THAT IS UNIQUE FOR EACH RECORD Primary key Customer number 12 25 30 47 LastName FirstName Archer Clarke Owen Parker Jeffrey Martha Hayes Stephen

  21. FOREIGN KEY A FIELD IN ANOTHER TABLE THAT MATCHES ONE OF THE VALUES OF THE PRIMARY KEY Primary key Customer number 12 25 30 47 LastName FirstName Archer Clarke Owen Parker Jeffrey Martha Hayes Stephen Foreign key Order number 1 2 3 4 Customer number 25 30 12 12 Date 7/12/14 7/12/14 7/14/14 7/15/14

  22. FOREIGN KEY A FIELD IN ANOTHER TABLE THAT MATCHES ONE OF THE VALUES OF THE PRIMARY KEY Customer number 12 25 30 47 LastName FirstName Archer Clarke Owen Parker Jeffrey Martha Hayes Stephen Can this be correct? Order number 1 2 3 4 5 Customer number 25 30 12 12 56 Date 7/12/14 7/12/14 7/14/14 7/15/14 7/16/14

  23. NORMALIZATION To normalize a database means to design it in a way that: 1) reduces data redundancy 2) ensure data integrity.

  24. SCHOOL DATABASE

  25. BIG DATA, DATA WAREHOUSING, DATA MINING

  26. DATA WAREHOUSE Organize and maintain the data: typically in a mD model Provide operators for navigation Often depends on human user to detect patterns The patterns are explicit but may be hidden Typically deals with summarized data

  27. DATA WAREHOUSE Non-operational Data Uses a copy of data from the active databases that the company uses in its day-to-day operations The data warehouse must pull data from the existing databases on a regular, scheduled basis Time-variant Data Whenever data is loaded into the data warehouse, it receives a time stamp hence allows for comparisons between different time periods Standardized Data ETL: Extraction, Transformation, Loading

  28. HISTORY OF DATA WAREHOUSING Data warehouses extend the transformation of data into information In the 1990 s executives became less concerned with the day-to-day business operations and more concerned with overall business functions The data warehouse provided the ability to support decision making without disrupting the day-to-day operations

  29. DATA WAREHOUSE FUNDAMENTALS Data warehouse a logical collection of information gathered from many different operational databases that supports business analysis activities and decision-making tasks The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes

  30. DATA WAREHOUSE FUNDAMENTALS Extraction, transformation, and loading (ETL) a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse Data mart contains a subset of data warehouse information

  31. DATA WAREHOUSE FUNDAMENTALS

  32. MULTIDIMENSIONAL ANALYSIS Databases contain information in a series of two- dimensional tables In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows Dimension a particular attribute of information

  33. MULTIDIMENSIONAL ANALYSIS AND DATA CUBE Cube common term for the representation of multidimensional information

  34. Data WarehouseTime Variant The time aspect for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain time element .

  35. Data WarehouseNon-Volatile A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data.

  36. OLTP vs. OLAP users function DB design data OLTP clerk, IT professional day to day operations application-oriented current, up-to-date detailed, flat relational isolated repetitive read/write index/hash on prim. key short, simple transaction Tens or hundreds thousands 100MB-GB transaction throughput OLAP knowledge worker decision support subject-oriented historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans usage access unit of work # records accessed #users DB size metric complex query millions dozens 100GB-TB query throughput

  37. Typical OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice: project and select Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes. Other operations drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its back-end relational tables (using SQL)

  38. MULTIDIMENSIONAL ANALYSIS AND DATA MINING Data mining the process of analyzing data to extract information not offered by the raw data alone To perform data mining users need data-mining tools Data-mining tool uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making

  39. A DATASET Data/points/instances/examples/samples/records: rows Features/attributes/dimensions/independent variables/covariates/predictors/regressors: columns Target/outcome/response/label/dependent variable: special column to be predicted

  40. MAIN TYPES OF COLUMNS Continuous: a number, like an age or height Discrete: a symbol, like cat or dog

  41. MAIN GOAL OF LEARNING: PREDICTION The setup: 1. You obtain some kind of model based on some examples, or training data, through a process called learning (also estimation). 2. Then you use that model to predict something about data you haven t seen before, but that comes from the same distribution as the training data, called test data.

  42. LEARNING TASKS Supervised learning: We re predicting a target variable for which we get to see examples. Regression: predict a continuous target variable Classification: predict a discrete target variable Unsupervised learning: We re predicting a target variable for which we never get to see examples. Clustering: predict clusters Dimensionality reduction: predict new features or select important features while discarding others

  43. CLASSIFICATION We re predicting a discrete target variable. Supervised.

  44. CLUSTERING: UNSUPERVISED A B

  45. EXAMPLE: CANCER PREDICTION Application: automatic disease detection Importance: this is modern/future medical diagnosis. Prediction goal: Based on past patients, predict whether new patients have the disease Data: Past patients with and without the disease Target: Cancer or no-cancer Features: Concentrations of various proteins in your blood

  46. EXAMPLE: NETFLIX Application: automatic product recommendation Importance: this is the modern/future shopping. Prediction goal: Based on past preferences, predict which movies you might want to watch Data: Past movies you have watched Target: Like or don t-like Features: ?

  47. EXAMPLE: CALL CENTERS Application: automatic call routing Importance: this is modern/future customer service. Prediction goal: Based on your speech recording, predict which words you said Data: Past recordings of various people Target: Which word was intended Features: ?

  48. EXAMPLE: STOCK MARKET Application: automatic program trading Importance: this is modern/future finance. Prediction goal: Based on past patterns, predict whether the stock will go up Data: Past stock prices Target: Up or down Features: ?

  49. HOW IT ALL FITS TOGETHER Pattern Evaluation Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

  50. GROWTH OF MACHINE LEARNING Last 15 years: Applications in industry - data mining, big data etc Applications in science - computational biology, etc Fast-growing presence in AI, statistics, applied math, etc Reasons: Data is everywhere. This phenomenon is growing. Many modeling problems are more easily cast as data (learning) problems. Both widely useful and intellectually rich (both mathematics and algorithms).

Related


More Related Content