Database Normalization: Understanding Sets and Normal Forms

Slide Note
Embed
Share

Explore the importance of set theory in database design, learn about different types of sets, subsets, and supersets, understand the basics of normalization techniques to efficiently organize data in databases, and discover the common anomalies associated with unnormalized databases.


Uploaded on Sep 06, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. INLS 623 DATABASE NORMALIZATION Instructor: Jason Carter

  2. SET THEORY Set A collection of zero or more distinct objects. What does set theory have to do with databases? A record is a set of attribute/property values Columns are a set of attributes Rows are a set of records Conventionally sets are denoted with capital letters A = {1,2,3} B = {2,1,5} C = {red, green, blue}

  3. SETS Equality {6, 11} = {11, 6} = {11, 6, 6, 11} . {1,2} = {2,1} Membership A = {1,2,3,4} = member of 4 A, 1 A, 3 A, 2 A = not a member of 6 A

  4. SETS Subsets a set A is a subset of a set B if all members of set A is also a member of set B = subset A = {1,3} B = {1,2,3,4} {1, 3} {1, 2, 3, 4} A B

  5. SETS Superset a set B is a superset of a set A if all members of set A are members of set B = superset A = {1,3} B = {1,2,3,4} {1, 2, 3, 4} {1, 3} B A

  6. TERMINOLOGY

  7. WHATIS NORMALIZATION? A technique to organize efficiently organize data in a database Efficiently : Eliminating redundant data Not storing the same data in more than one table Ensuring that functional dependencies make sense

  8. WITHOUT NORMALIZATION student_id 401 402 403 404 name Adam Alex Stuart Adam address 133 Our Lane 123 Here Lane 123 My Lane 123 Their Lane subject Biology Math Math Physics Update Anomaly : To update address of a student who occurs twice or more than twice in a table, we will have to update address column in all the rows, else data will become inconsistent. Insertion Anomaly : Suppose for a new admission, we have a Student id(S_id), name and address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anamoly. Deletion Anomaly : If (student_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.

  9. NORMAL FORM 1st Normal Form 2nd Normal Form 3rd Normal Form Boyce-Codd Normal Form (3.5 Normal Form) 4th Normal Form

  10. 1ST NORMAL FORM Every cell in the table is atomic A cell value cannot be divided further Seen differently there are no grouping of information inside a cell. No repeating groups

  11. 1ST NORMAL FORM Student Id First name Bob Joe Alice Shelly Last name Wood Smith Boone Kent Grades Classes 1 2 3 4 C,B A,D A,A A,B 401, 623 550, 823 890,991 770,881 Does this table violate first normal form?

  12. 1ST NORMAL FORM Student Id First name Bob Joe Alice Shelly Last name Wood Smith Boone Kent Grades Classes 1 2 3 4 C,B A,D A,A A,B 401, 623 550, 823 890,991 770,881 Grades and Classes have multiple rows of data in one column

  13. 1ST NORMAL FORM Student Id First name Bob Joe Alice Shelly Bob Joe Alice Shelly Last name Wood Smith Boone Kent Wood Smith Boone Kent Grades Classes 1 2 3 4 5 6 7 8 C A A A B D A B 401 550 890 770 623 823 991 881 Create new rows

  14. If the primary key is a composite of attributes (contains multiple columns), the non key attributes (columns) must depend on the whole key. 2ND NORMAL FORM Table must be in 1st Normal Form If the primary key is a composite of attributes (contains multiple columns), the non key attributes (columns) must depend on the whole key.

  15. FUNCTIONAL DEPENDENCIES A functional dependency is a relationship between or among attributes in a table. One attribute is functionally dependent on another if the value of the second attribute determines the value of the first attribute. If you know the value of the second attribute, you can determine the value of the first attribute. Total Charge = StandardCharge * NumberOfTests

  16. 2ND NF: FUNCTIONAL DEPENDENCIES EXAMPLES SSN (PK) First Name Last Name Age 343-33333 Jack Doe 21 398-34533 Jane Doe 25 500-33333 Jill Roy 32 700-33333 Jane Doe 50 SSN Age SSN FN SSN LN

  17. 2ND NF: FUNCTIONAL DEPENDENCIES EXAMPLES Price Customer_ID Product 100 Cell Phone 295.00 101 Wallet 25.00 100 Toothpaste 5.99 101 Jeans 49.99 What is the primary key? Customer_ID, Product

  18. 2ND NF: FUNCTIONAL DEPENDENCIES EXAMPLES All attributes must depend on the whole key (Customer_ID + Product). Is this table in 2nd NF? Price Customer_ID (PK) Product (PK) 100 Cell Phone 295.00 101 Wallet 25.00 100 Toothpaste 5.99 101 Jeans 49.99

  19. 2ND NF: FUNCTIONAL DEPENDENCIES EXAMPLES Price Customer_ID (PK) Product (FK) Product (PK) 100 Cell Phone Cell Phone 295.00 101 Wallet Wallet 25.00 100 Toothpaste Toothpaste 5.99 101 Jeans Jeans 49.99

  20. 2ND NF: FUNCTIONAL DEPENDENCIES EXAMPLES First Ten Customers get a discount off the normal price Is this table in 2nd NF? Price Customer_ID (PK) Product (PK) 100 Cell Phone 295.00 101 Wallet 25.00 100 Toothpaste 5.99 101 Jeans 49.99

  21. 2ND NF: FUNCTIONAL DEPENDENCIES EXAMPLES YES Price Customer_ID (PK) Product (PK) 100 Cell Phone 295.00 101 Wallet 25.00 100 Toothpaste 5.99 101 Jeans 49.99 Price depends on the Customer_ID and Product

  22. 3RD NORMAL FORM Table must be in 2nd Normal Form A table is in 3NF if: Table has no transitive dependencies.

  23. TRANSITIVE DEPENDENCY One attribute (column) depends on a second attribute, which depends on a third attribute A non-key column that rely on another non-key attributes, and not the primary key.

  24. 3RD NF: TRANSITIVE DEPENDENCY Only one for each CustomerID Customer # (PK) Product Price 100 Cell Phone 295.00 102 Wallet 25.00 103 Toothpaste 5.99 104 Jeans 49.99 Customer 102 is unhappy with his wallet and wants to return it. What do you do?

  25. 3RD NF: TRANSITIVE DEPENDENCY Want to remove the second row Product Price Customer_ID (PK) 100 Cell Phone 295.00 103 Toothpaste 5.99 104 Jeans 49.99 You lose the fact that a wallet cost $25. Price depends on Product, Product depends on Customer_ID

  26. 3RD NF: TRANSITIVE DEPENDENCY Price Customer_ID (PK) Product (FK) Product (PK) 100 Cell Phone Cell Phone 295.00 101 Wallet Wallet 25.00 100 Toothpaste Toothpaste 5.99 101 Jeans Jeans 49.99

More Related Content