Understanding Normalization in Database Management
Normalization is a crucial database design technique used to organize tables efficiently, reduce data redundancy, and prevent anomalies in data operations. This process involves decomposing larger tables into smaller, linked tables to ensure consistency and ease of data management.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Normalization Prof. V. B. More, MET s Institute of Engineering, Bhujbal Knowledge City, Nashik Literature is inherited from Vladimir and updated as per the need.
Normalization Why do we need to normalize? To avoid redundancy To reduce storage space needed To make data consistent To avoid update/delete anomalies
Normalization Questions on Normalization 1) Define Normalization. Explain three normal forms with suitable example.[5] 2) Define database normalization. Explain any two normal forms with suitable example.[5] 3) Differentiate between BCNF and 3NF.
Normalization E.F. Codd proposes normalization as an integral part of a relational model. Definition: Normalization is a database design technique which is used to organize the tables in such manner that it should reduce redundancy of data. It divides larger tables to smaller tables and links these tables with their relationship. To improve the design, decomposition is done into multiple smaller relations.
Normalization Need of Normalization In database management, processing without normalization is not easy to carry out operations like insertion, deletion, and updation without any loss. These operations, without normalization, leads to have anomalies i.e. inconsistency or abnormality in database. What is anomaly: something that deviates from what is standard, normal, or expected. A Insert Anomaly B Update Anomaly C Delete Anomaly
Normalization Insertion Anomaly Ssn course-id Grade Name Address 123 cs33 1A smith Main 234 null null jones Forbes Insertion anomaly: Cannot make a record Jones address because he is not taking any classes
Normalization Update Anomaly Ssn c-id Grade Name Address 123 cs331 A smith Main Main 123 cs351 B smith 123 cs211 A smith Main Clearly, Name and Address are redundant (larger relation + need to update 3 records to update the Address)
Normalization Delete Anomaly Ssn c-id Grade Name Address 123 cs331 A jones 123 Main 124 cs351 B smith 124 Broad 125 cs211 A jack 42 Penn Delete anomaly: Cannot delete Jack s enrolment without loosing his address as well
Normal Forms First Normal Form 1NF Second Normal Form 2NF Third Normal Form 3NF Boyce-Codd Normal Form BCNF Fourth Normal Form 4NF Fifth Normal Form 5NF
First Normal Form (1NF) 1NF: all attributes in a relation must have atomic and single values ( no repeating groups ) Following relation is Not in 1NF Atomic: cannot be divided further Last Name Patil First Name Hrishikesh Kalyani Madhuri Mohit Vishakha Jayesh Sham Chavan
First Normal Form (1NF) Normalized to 1NF Last Name Patil Patil Patil Patil Patil Chavan Chavan First Name Hrishikesh Kalyani Madhuri Mohit Vishakha Jayesh Sham
First Normal Form (1NF) Not in 1NF Name Michael Raphael Gabriel Uriel Metatron Weight 187 lb 192 lb 201 lb 165 lb 195 kg
First Normal Form (1NF) Normalized to 1NF Name Michael Raphael Gabriel Uriel Metatron Weight 187 192 201 165 195 Unit lb lb lb lb kg
First Normal Form (1NF) Not in 1NF Supplier S1 S2 S3 Part P1, P3, P4 P3 P2, P3
First Normal Form (1NF) Is this relation in 1NF? Supplier Part1 S1 S2 S3 Part2 P3 null P3 Part3 P4 null null P1 P3 P2 Formally yes, but in essence, NO! 15 Prof. V.B.More, MET's IOE, BKC Nashik
First Normal Form (1NF) Normalized to 1NF Supplier S1 S1 S1 S2 S3 S3 Part P1 P3 P4 P3 P2 P3 16 Prof. V.B.More, MET's IOE, BKC Nashik
Decomposition using Functional Dependency Functional Dependency: The attributes of a relation is said to be dependent on each other when an attribute of a table uniquely identifies another attribute of the same table is called functional dependency. If attribute A of relation uniquely identifies the attribute B, then it can be represented as A --> B means attribute B is functionally dependent on attribute A. Formal Definition: Let R is the relation schema with attributes A, B. A set of attributes B in R is said to be functionally dependent on set of attributes A in R, i.e. A --> B. (FD) LHS of FD is determinant set of attributes (key) RHS of FD is dependent set of attributes
Decomposition using Functional Dependency In STUDENT relation, RollNo attribute is used to identify unique student i.e. student name. Therefore, name of the student Student_Name is functionally dependent on RollNo. RollNo Student_Name RollNo Student_Name, Address, Course FD: STUDENT table RollNo Student_Name 6 Arekar Siddhant Ravindra Address Nashik Course An Introduction To Programming Through C++ Programming, Data Structures And Algorithms Using Python The Joy of Computing using Python Nagpur 7 Bagal Poonam Shivaji Mumbai 8 Bairagi Hitesh Kailas Banwait Sharanpreet Singh Surjit Singh Shinde Pratima Arun Shirsat Akansha Kishor Singh Nishu Ramkumar Thakare Kajal Sukadev Brahme Gauri Sanjay 9 Nanded Introduction to Programming in C 46 47 48 49 13 Jalgaon Nagar Surat Dhule Sinner Problem Solving through Programming in C C Programming and Assembly Language Introduction to parallel Programming in Open MP Data Base Management System Design and analysis of algorithms
Decomposition using Functional Dependency It is possible that, attribute of a relation can be identified by a set of another attributes. E.g. RollNo, Student_Name Address RollNo, Student_Name Course FD: Generally, composite primary key of relation is determinant in FD. STUDENT table RollNo Student_Name 6 Arekar Siddhant Ravindra Address Nashik Course An Introduction To Programming Through C++ Programming, Data Structures And Algorithms Using Python The Joy of Computing using Python Nagpur 7 Bagal Poonam Shivaji Mumbai 8 Bairagi Hitesh Kailas Banwait Sharanpreet Singh Surjit Singh Shinde Pratima Arun Shirsat Akansha Kishor Singh Nishu Ramkumar Thakare Kajal Sukadev Brahme Gauri Sanjay 9 Nanded Introduction to Programming in C 46 47 48 49 13 Jalgaon Nagar Surat Dhule Sinner Problem Solving through Programming in C C Programming and Assembly Language Introduction to parallel Programming in Open MP Data Base Management System Design and analysis of algorithms
Decomposition using Functional Dependency Types of Functional Dependency Single-Valued Functional Dependency Multi-Valued Functional Dependency Fully Functional Dependency Partial Functional Dependency Transitive Functional Dependency Trivial Functional Dependency Non-Trival FD STUDENT table RollNo Student_Name 6 Arekar Siddhant Ravindra Address Nashik Course An Introduction To Programming Through C++ Programming, Data Structures And Algorithms Using Python The Joy of Computing using Python Nagpur 7 Bagal Poonam Shivaji Mumbai 8 Bairagi Hitesh Kailas Banwait Sharanpreet Singh Surjit Singh Shinde Pratima Arun Shirsat Akansha Kishor Singh Nishu Ramkumar Thakare Kajal Sukadev Brahme Gauri Sanjay 9 Nanded Introduction to Programming in C 46 47 48 49 13 Jalgaon Nagar Surat Dhule Sinner Problem Solving through Programming in C C Programming and Assembly Language Introduction to parallel Programming in Open MP Data Base Management System Design and analysis of algorithms
Second Normal Form (2NF) 2NF: 1NF and all non-key attributes are fully dependent on the PK ( no partial dependencies ) Not in 2NF Student Erik Sven Inge Hildur Course_ID CIS331 CIS331 CIS331 CIS362 Grade A B C A Address 80 Ericsson Av. 12 Olafson St. 192 Odin Blvd. 212 Reykjavik St. 21 Prof. V.B.More, MET's IOE, BKC Nashik
Second Normal Form (2NF) Student Erik Sven Inge Hildur Address 80 Ericsson Av. 12 Olafson St. 192 Freya Blvd. 212 Reykjavik St. Normalized to 2NF Student Erik Sven Inge Hildur Course_ID CIS331 CIS331 CIS331 CIS362 Grade A B C A 22 Prof. V.B.More, MET's IOE, BKC Nashik
Third Normal Form (3NF) 3NF: 2NF and no transitive dependencies Student Erik Sven Inge Hildur Course_ID CIS331 CIS331 CIS331 CIS362 Grade A B C A Grade_value 4.00 3.00 2.00 4.00 Not in 3NF 23 Prof. V.B.More, MET's IOE, BKC Nashik
Third Normal Form (3NF) Student Erik Sven Inge Hildur Course_ID CIS331 CIS331 CIS331 CIS362 Grade A B C A Grade A B C Grade_value 4.00 3.00 2.00 Normalized to 3NF 24 Prof. V.B.More, MET's IOE, BKC Nashik
BCNF: It is an advance version of 3NF that s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table It means: Everything depends on the full key 25 Prof. V.B.More, MET's IOE, BKC Nashik
BCNF: Example Suppose there is a company where employees work in more than one department. They store the data like this: dept_no_of_e mp emp_id emp_nationality emp_dept dept_type Production and planning 1001 Austrian D001 200 1001 Austrian stores D001 250 design and technical support 1002 American D134 100 Purchasing department 1002 American D134 600 26 Prof. V.B.More, MET's IOE, BKC Nashik
BCNF: Example: Functional dependencies in the table above: emp_id -> emp_nationality emp_dept -> {dept_type, dept_no_of_emp} Candidate key: {emp_id, emp_dept} The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To make the table comply with BCNF we can break the table in three tables like: 27 Prof. V.B.More, MET's IOE, BKC Nashik
BCNF: Example: emp_nationality table emp_id emp_nationality 1001 Austrian 1002 American 28 Prof. V.B.More, MET's IOE, BKC Nashik
BCNF: Example emp_dept table emp_dept dept_type dept_no_of_emp Production and planning D001 200 stores D001 250 design and technical support D134 100 Purchasing department D134 600 29 Prof. V.B.More, MET's IOE, BKC Nashik
BCNF: Example emp_dept_mapping table emp_id emp_dept 1001 Production and planning 1001 stores 1002 design and technical support 1002 Purchasing department 30 Prof. V.B.More, MET's IOE, BKC Nashik
Normal forms: BCNF vs. 3NF If a relation is in BCNF, it is always in 3NF (but not the converse) In practice, aim for BCNF If that s impossible, we accept 3NF; but we insist on lossless join and dependency preservation 31 Prof. V.B.More, MET's IOE, BKC Nashik
Differentiate between BCNF and 3NF BCNF vs. 3NF BASIS FOR COMPARISON 3NF BCNF Concept No non-prime attribute must be transitively dependent on the Candidate key. For any trivial dependency in a relation R say X->Y, X should be a super key of relation R. Dependency 3NF can be obtained without sacrificing all dependencies. Dependencies may not be preserved in BCNF. Decomposition Lossless join decomposition can be achieved in 3NF. Lossless decomposition is hard to achieve in BCNF. Achievability Always achievable Not always achievable 32 Prof. V.B.More, MET's IOE, BKC Nashik
BCNF vs. 3NF Key Differences Between 3NF and BCNF 1. 3NF states that no non-prime attribute must be transitively dependent on the candidate key of the relation. On the other hands, BCNF states that if a trivial functional dependency X -> Y exist for a relation; then X must be a super key. 2. 3NF can be obtained without sacrificing the dependency of relation. However, dependency may not be preserved while obtaining BCNF. 3. 3NF can be achieved without loosing any information from the old table whereas, while obtaining BCNF we may loose some information from the old table. 33 Prof. V.B.More, MET's IOE, BKC Nashik
BCNF vs. 3NF BCNF is much restrictive than 3NF which help in normalizing the table more. The relation in 3NF has minimum redundancy left which is further removed by the BCNF. 34 Prof. V.B.More, MET's IOE, BKC Nashik
Normalization: Examples Supplier S1 S1 S1 S2 S7 Name Jones Jones Jones Spiritoso Kohl Status_City 10 10 10 12 10 City Paris Paris Paris London Paris Part P3 P1 P4 P3 P4 Qty 257 500 125 (null) 342 Start by determining functional dependencies! 35 Prof. V.B.More, MET's IOE, BKC Nashik
Normalization: Example Functional Dependency Supplier - > Name Supplier - > City - > Status_City Supplier - > Status_City (Supplier, Part) - > Qty Partial Dependencies (Supplier, Part) - > Name (Supplier, Part) - > City (Supplier, Part) - > Status_City 36 Prof. V.B.More, MET's IOE, BKC Nashik
Normalization: Example Supplier S1 S2 S7 Name Jones Spiritoso Kohl Status_City 10 12 10 City Paris London Paris Supplier Part Qty S1 S1 S1 S2 S7 P1 P3 P4 P3 P4 500 257 125 (null) 342 37 Prof. V.B.More, MET's IOE, BKC Nashik
Normalization: Example We took care of the partial dependencies, but what about transitive dependencies? Supplier S1 S2 S7 Name Jones Spiritoso Kohl Status_City 10 12 10 City Paris London Paris 38 Prof. V.B.More, MET's IOE, BKC Nashik
Normalization: Example Supplier Name City Supplier Part S1 S1 S1 S2 S7 Qty 500 257 125 (null) 342 P1 P3 P4 P3 P4 S1 Jones Paris S2 Spirit London S7 Kohl Paris City Status_City Paris London 10 12 39 Prof. V.B.More, MET's IOE, BKC Nashik
Normalization: Examples lossless-join decomposition A decomposition is a lossless-join decomposition if the joining attribute is a superkey in at least one of the new relations 40 Prof. V.B.More, MET's IOE, BKC Nashik
Normalization: Final Thoughts There are higher normal forms (4NF, 5NF), In practice, normalized means in BCNF or 3NF Luckily, in practice, ER diagrams lead to normalized tables (but do not rely on luck) 41 Prof. V.B.More, MET's IOE, BKC Nashik
Normalization: Overview A good decomposition should: be a lossless join decomposition (you can recover original tables with a join) preserve dependencies (FD s should not span two tables) 42 Prof. V.B.More, MET's IOE, BKC Nashik
4th Normal Form For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions: 1. It should be in the Boyce-Codd Normal Form. 2. And, the table should not have any Multi-valued Dependency. A table is said to have multi-valued dependency, if the following conditions are true, 1. For a dependency A B, if for a single value of A, multiple value of B exists, then the table may have multi-valued dependency. 43 Prof. V.B.More, MET's IOE, BKC Nashik
4th Normal Form 2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency. 3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be independent of each other. If all these conditions are true for any relation(table), it is said to have multi-valued dependency 44 Prof. V.B.More, MET's IOE, BKC Nashik
4th Normal Form s_id 1 1 2 2 course Science Maths C# Php hobby Cricket Hockey Cricket Hockey Student with s_id 1 has opted for two courses, Science and Maths, and has two hobbies, Cricket and Hockey. For student with s_id 1 there are two hobbies exists, hence along with both the courses, these hobbies should be specified. Prof. V.B.More, MET's IOE, BKC Nashik 45
4th Normal Form In the table, there is no relationship between the columns course and independent of each other. hobby. They are So there is multi-value dependency, which leads to un-necessary repetition of data anomalies as well. s_id course 1 Science 1 Maths 1 Science 1 Maths hobby Cricket Hockey Hockey Cricket and other Prof. V.B.More, MET's IOE, BKC Nashik 46
4th Normal Form CourseOpted Table How to satisfy 4th Normal Form? To make the relation satify the 4th normal form, we can decompose the table into 2 tables s_id 1 1 2 2 Hobbies Table course Science Maths C# Php s_id 1 1 2 2 hobby Cricket Hockey Hockey Cricket one is CoursesOpted table second is Hobbies table Prof. V.B.More, MET's IOE, BKC Nashik 47
4th Normal Form Now this relation satisfies the fourth normal form. A table can also have functional dependency along dependency. In that case, the functionally dependent columns are moved in a separate table and the multi-valued dependent columns are moved to separate tables. with multi-valued Prof. V.B.More, MET's IOE, BKC Nashik 48
5th Normal Form The 5NF (Fifth Normal Form) is also known as project-join normal form (PJNF). A relation is in Fifth Normal Form (5NF), if it is in 4NF, and won t have lossless decomposition into smaller tables. A relation is in 5NF, if the candidate key implies every join dependency in it. Prof. V.B.More, MET's IOE, BKC Nashik 49
5th Normal Form Example The below relation violates the Fifth Normal Form (5NF) of Normalization: <Employee> table EmpName David EmpSkills Java EmpJob E145 John JavaScript E146 Jamie jQuery E146 Emma Java E147 Prof. V.B.More, MET's IOE, BKC Nashik 50