
Effective Data Cleaning Techniques for Improved Analysis
"Learn about examples of raw data and the importance of cleaning messy data before analysis. Discover strategies for handling incomplete, duplicate, incorrect, and inconsistent data to ensure accurate insights. See visual representations of data cleaning processes and best practices for maintaining data quality."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Examples of Raw Data
Incomplete Data Before Cleaning In this example, there a null values instead of Middle Initials. Null values are placeholders to signify missing or unknown values. Oftentimes, null values can present issues in data analysis and visualizations. Therefore, it is important to have a standardized way of taking care of these values.
Incomplete Data After Cleaning In this case, we decided to replace all null values for blank spaces. Therefore, when the names are processed, it will not include "Null" as the middle initial.
Duplicate Data Before Cleaning In this example, it appears that there are four entries for the same person. However, this is an assumption that may not be true. It is possible that there are four JohnnyOs living at 3465 S Morgan St. Therefore, it is important to check your data sources and collection methods before potential deletion.
Duplicate Data After Cleaning After determining these were duplicate entries, then we safely deleted extraneous entries. Now, there is only one John Veliotis Sr. at 3465 S Morgan St. in our data.
Incorrect Data Before Cleaning In this example, we have an extreme outlier. In row 6, a female respondent is listed to be 6 inches tall. We can make a set a standard and assume that this is incorrect. Therefore, we need to determine how to deal with this outlier. Do we remove it? Do we re-survey the participant?
Incorrect Data After Cleaning In this case, we had the ability and the resources to verify with the female respondent. We have corrected the data to list 60 inches.
Inconsistent Data Before Cleaning All data in this example is inconsistent. The format for both dates and currency are unstandardized. We need to make decisions. What should the date format be? What should the currency be?
Inconsistent Data After Cleaning We decided to have each date be formatted like MM/DD/YY and the currency to be USD. Therefore, we have converted all dates and amounts to reflect this. Additionally, we have ordered the Order Date column to be most recent to least recent. (There was also a null value that we decided to equal $0!)