Data Preprocessing Essentials: Steps, Benefits, and Techniques

slide1 l.w
1 / 10
Embed
Share

Discover the significance of data preprocessing, including cleaning, integration, reduction, and transformation. Learn why preprocessing is vital, the major steps involved, and the challenges it addresses. Explore the key methods such as data cleaning, integration, and reduction. Uncover the importance of handling missing values, removing noise, and resolving inconsistencies for better data quality.

  • Data preprocessing
  • Data cleaning
  • Data integration
  • Data reduction
  • Data transformation

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Data Data Cleaning Preprocessing An Overview Data Integration Data Reduction By Sandeep Patil, Department of Computer Engineering, I IT Data Transformation

  2. Outline What is Data Preprocessing ? Major Steps in Data Preprocessing Data Cleaning Data Integration Data Reduction Data Transformation and Data Discretization Conclusion International Institute of Information Technology, I IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  3. Why Data Preprocessing ? Need of data preprocessing Some part of Data may have problems like Incomplete (absence of data) Inaccurate or noisy (other than expected values) Inconsistent (containing discrepancies) Timeliness (old version of data) Believability (users faith in the correctness of the data) Interpretability (simplicity in understanding the data) International Institute of Information Technology, I IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  4. Major Steps in Data Preprocessing Data Cleaning Data Integration Data Reduction Data Transformation International Institute of Information Technology, I IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  5. Data Cleaning Filling Missing values Smoothing Remove Noisy data Identifying or removing outliers Resolving inconsistencies. International Institute of Information Technology, I IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  6. Data Integration Entity Identification Problem - Integrating multiple databases, data cubes, or files Redundancy and Correlation Analysis Tuple Duplication - updating some but not all data occurrences. Data Value Conflict Detection and Resolution - for the same real-world entity, attribute values from different sources may differ International Institute of Information Technology, I IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  7. Data Reduction To obtain a reduced representation of the data set that is much smaller in volume Numerosity Reduction - Parametric methods - Nonparametric methods eg. Regression and log-linear models etc. eg. Histograms, clustering, sampling etc. Data Compression - lossless - lossy International Institute of Information Technology, I IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  8. Data Transformation and Data Discretization Data are transformed or consolidated into forms appropriate for mining - Smoothing - Attribute construction or feature construction - Aggregation, - Normalization - Discretization - Concept hierarchy generation International Institute of Information Technology, I IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  9. Conclusion Although numerous methods of data preprocessing have been developed, data preprocessing remains an active area of research, due to the huge amount of inconsistent or dirty data and the complexity of the problem. International Institute of Information Technology, I IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  10. THANK YOU For further information please contact Prof. Sandeep Patil Department of Computer Engineering Hope Foundation s International Institute of Information Technology, I IT Hinjawadi, Pune 411 057 Phone - +91 20 22933441 www.isquareit.edu.in | sandeepp@isquareit.edu.in

Related


More Related Content