Efficient Database Management Through Normalization

03 database normalization n.w
1 / 18
Embed
Share

Database normalization is a crucial process for organizing data efficiently in databases to prevent redundancy, inconsistencies, and data loss. This method enhances database performance, accuracy, and management, making it easier to update and maintain databases effectively. Normalization ensures that data is stored in the most optimal way, improving query performance and overall system efficiency.

  • Database Management
  • Data Modeling
  • Database Efficiency
  • Data Normalization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. 03 Database Normalization

  2. What is normalization? Database normalization, or just normalization A process used for data modelling or database creation, where you organize your data and tables so it can be added and updated efficiently. Done manually @ design time (no click a button and do it !)

  3. Why Normalize a Database? Make the database more efficient Prevent the same data from being stored in more than one place Prevent updates being made to some data but not others Prevent data not being deleted when it is supposed to be, or from data being lost when it is not supposed to be Ensure the data is accurate Reduce the storage space that a database takes up Ensure the queries on a database run as fast as possible Without normalization, running the database can be slow and is almost guaranteed to be hard to manage, update and keep accurate

  4. Example 1 Consider a student database You want to keep track of: The student name Program (Degree) Name Classes(s) taken Fees paid You could envision a simple spreadsheet (table) for holding that information Student ID Student Name Fees Paid Program Name Class 1 Class 2 Class 3 1 John Smith 200 Economics Economics 1 Biology 1 2 Maria Griffin 500 Computer Science Biology 1 Business Intro Programming 2 3 Susan Johnson 400 Medicine Biology 2 4 Matt Long 850 Dentistry

  5. Example 1 This creates some immediate problems Student ID Student Name Fees Paid Program Name Class 1 Class 2 Class 3 1 John Smith 200 Economics Economics 1 Biology 1 2 Maria Griffin 500 Computer Science Biology 1 Business Intro Programming 2 3 Susan Johnson 400 Medicine Biology 2 4 Matt Long 850 Dentistry What happens when you want to add a new student, but their program is not known? Student ID Student Name Fees Paid Program Name Class 1 Class 2 Class 3 . Jim Kirk 0 ? 5 You are adding incomplete data which can cause problems when analyzing the data This is an INSERT anomaly. You can t add a student without also adding their classes

  6. Example 2 What if we want to modify some data? Student ID Student Name Fees Paid Program Name Class 1 Class 2 Class 3 1 John Smith 200 Economics Economics 1 Biology 1 2 Maria Griffin 500 Computer Science Biology 1 Business Intro Programming 2 3 Susan Johnson 400 Medicine Biology 2 4 Matt Long 850 Dentistry We rename Biology 1 to Intro to Biology ? We would have to search and locate EVERY instance of Biology 1 and make sure we update it This would be an UPDATE anomaly. You can make a change, but fail to make ALL the necessary changes It would be FAR better if we could only make ONE change, and cover everyone taking Biology

  7. Example 3 What if we want to remove some data? Student ID Student ID Student Name Student Name Fees Paid Fees Paid Program Name Program Name Class 1 Class 1 Class 2 Class 2 Class 3 Class 3 1 John Smith John Smith 200 Economics Economics Economics 1 Economics 1 Biology 1 Biology 1 1 200 2 Maria Griffin Maria Griffin 500 Computer Science Computer Science Biology 1 Biology 1 Business Intro Business Intro Programming 2 Programming 2 2 500 3 Susan Johnson Matt Long 400 Medicine Dentistry Biology 2 4 850 4 Matt Long 850 Dentistry Susan Johnson drops out, and we delete the record This would remove ALL traces of Biology 2 This would be a DELETE anomaly. Deleting one thing results in other data (that you want to keep) being deleted You would want to delete the student record, but not wipe out the existence of the Class!!

  8. Other issues Student ID Student Name Fees Paid Program Name Class 1 Class 2 Class 3 1 John Smith 200 Economics Economics 1 Biology 1 2 Maria Griffin 500 Computer Science Biology 1 Business Intro Programming 2 3 Susan Johnson 400 Medicine Biology 2 4 Matt Long 850 Dentistry What if they take 4 classes? What if they switch programs? What if you want to add a new program? Add a new class?

  9. Normalization as a design strategy Introduced alongside the relational model in the early 70 s by Codd, et al. Many different normal forms: UNF, 1NF, 2NF, 3NF, BCNF, 4NF, ETNF, 5NF, DKNF, 6NF. Subtle differences between each. Basic tenets: Independence of data tables Each Normalization Form (1NF, 2NF, 3NF) has increasing levels of independence There are more levels but as a matter of practicality, most will stop @ 3NF

  10. First Normal Form Student Name Fees Paid Date of Birth Program Name Class 1 Class 2 Class 3 Class 4 Teacher Name Economics 1 (Business) Biology 1 (Science) James Peterson John Smith 1800 04-Aug-91 Economics Computer Science Biology 1 (Science) Business Intro (Business) Programming 2 (IT) James Peterson Maria Griffin 1500 10-Sep-92 Biology 2 (Science) Susan Johnson 2000 13-Jan-91 Medicine Sarah Francis Matt Long 1400 25-Apr-92 Dentistry Shane Cobson Does the combination of all columns make a unique row every single time? No. You could have people with the same name, classes etc. might be rare, but could happen What field can be used to uniquely identify the row? Is this the student name? No. Class? No, this isn t unique either. So we would need a Primary Key!

  11. Updated design Primary Key PK ID 1NF Schema Name Fees Paid DoB Program Name Class 1 Class 2 Class 3 Class 4 Teacher Name

  12. The primary key is student ID, which represents the student. Lets look at each column: student name: Yes, this is dependent on the primary key. A different student ID means a different student name. fees paid: Yes, this is dependent on the primary key. Each fees paid value is for a single student. date of birth: Yes, it s specific to that student. class 1: No, this column is not dependent on the student. More than one student can be enrolled in one class. class 2: As above, more than one class is allowed. class 3: No, same rule as class 2. class 4: No, same rule as class 2 teacher name: No, the teacher name is not dependent on the student. program name: No, the program name is not dependent on the student. We have a mix of Yes and No here. Some fields are dependent on the student ID, and others are not. 2nd Normal Form The rule of second normal form on a database can be described as: 1. Fulfil the requirements of first normal form 2. Each non-key attribute must be functionally dependent on the primary key What does this mean?

  13. Multiple Tables for 2NF class ID class name 1 Economics 1 (Business) 2 Biology 1 (Science) 3 Business Intro (Business) student ID student name fees paid date of birth 4 Programming 2 (IT) 5 Biology 2 (Science) 1 John Smith 200 04-Aug-91 teacher ID teacher name 1 James Peterson 2 Maria Griffin 400 10-Sep-92 2 Sarah Francis 3 Shane Cobson 3 Susan Johnson 800 13-Jan-91 program ID program name 1 Computer Science 4 Matt Long 600 25-Apr-92 2 Dentistry 3 Economics 4 Medicine But how do we connect all the data?

  14. Foreign Keys class ID class name Foreign key 1 Economics 1 (Business) 2 Biology 1 (Science) 3 Business Intro (Business) stud ent ID prog ram ID 4 Programming 2 (IT) student name fees paid date of birth 5 Biology 2 (Science) teacher ID teacher name 3 1 John Smith 200 04-Aug-91 1 James Peterson 2 Sarah Francis 1 3 Shane Cobson 2 Maria Griffin 400 10-Sep-92 4 3 Susan Johnson 800 13-Jan-91 program ID program name 1 Computer Science 2 4 Matt Long 600 25-Apr-92 2 Dentistry 3 Economics 4 Medicine

  15. How to manage the other tables? What about the program table? Questions: How are teachers related to others? Does a program have many students, or does a student have many programs? - A student can have one teacher for all classes - A subject could have a single teacher that teaches that specific class BOTH! - Program could have a teacher that teaches all classes for the program This is a many-to-many relationship - Multiple teachers could teach a particular class DOES A TEACHER HAVE MANY CLASSES, or DOES A CLASS HAVE MANY TEACHERS Definitely the first statement is true

  16. Side note: Data and mandatory fields Consider this (student_name) PK First Name Last_Name Middle_Name Nickname We separate out things like Courses, Addresses into separate tables but what about just a student's name? PK First Name Last_Name Middle_Name Nickname Valid 1 James Kirk Tiberius Jim PK First Name Last_Name Middle_Name Nickname Valid? 1 Spock Yes, it s valid. You can t ALWAYS have non-null values, no matter how much you break down the tables

  17. 3NF Third normal form is the final stage of the most common normalization process. The rule for this is: A A determines B normal form Fulfils the requirements of second B dependency Has no transitive functional B determines C C What does this mean? What is a transitive functional dependency? It means that every attribute that is not the primary key must depend on the primary key and the primary key only. B A B determines C A determines B C B

  18. Example Cust_I D Name Acct_Nu mber Bank_co de Bank Name Bank Code Bank Name 1 200004 400 Chase 303 John Smith BoA 2 545151 501 ESL 400 Maria Griffin Chase 3 353453 400 Chase 501 Susan Johnson ESL 4 784234 303 BoA Matt Long Cust_I D Name Acct_Nu mber Bank_co de 1 200004 400 John Smith 2 545151 501 Maria Griffin 3 353453 400 Susan Johnson Bank Name depends on bank_code 4 784234 303 Matt Long

More Related Content