Understanding Data Flow in Machine Learning Systems
Explore the intricate data flow within machine learning systems through the stages of data design, model building, data cleaning, and evaluation. Learn about the importance of data types, training data, and data normalization in creating effective machine learning models.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Data Flow in an ML system d.ML, Winter 2018-19
Data design Model design Output design Data Model Interaction Model Evaluating Data Cleaning Model Output Model Building
Data design Model design Output design Data Model Evaluating Data Cleaning Model Output Model Building
Data design Model design Output design Data Model Evaluating Data Cleaning Model Output Model Building
Data All machine learning models need data Where does your data come from? What is the type() of each of the variables (columns) Animal Legs Furry Sound Cat 4 Yes Meow Dog 4 Yes Woof Pig 4 No Oink Key vocab: Lizard 4 No N/A ... ... ... ... Training data - the data you use to train your ML model Data type - the type/format of your data (string/integer)
Data Cleaning Format the data in a way that the computer can read it Might choose to exclude missing values Explore your data - look for trends that might inform you Remember - how was your data collected? How is it going to be used? Animal Legs Furry Sound Cat 4 Yes Meow Dog 4 Yes Woof Pig 4 No Oink Lizard 4 No N/A ... ... ... ... Key vocab: Animal Legs Furry Normalizing Remove NA Cat 4 1 Dog 4 1 Pig 4 0 Lizard 4 0 ... ... ...
Data design Model design Output design Data Model Evaluating Data Cleaning Model Output Model Building
Model Building Ask yourself: What type of problem are you trying to solve? Data + Algorithm = model Algorithm: Clustering, Regression, Decision Tree, etc. Key vocab: Supervised vs Unsupervised learning Supervised - knowing what the data should be, categorizing Unsupervised - letting the ML find patterns for you Algorithm
What can Machine Learning do? Classification of new data Dog or cat? Find trends and patterns (regression, clustering) Animal Legs Furry Sound Cat 4 Yes Meow Dog 4 Yes Woof What can t ML do? Pig 4 No Oink Lizard 4 No N/A Clean your data! Identify patterns that ARE NOT in the data ... ... ... ... Animal Legs Furry Cat 4 1 Dog 4 1 Pig 4 0 Lizard 4 0 ... ... ...
Model Evaluating How well can your model [predict] unseen data? Animal Legs Furry Cat 4 1 Key vocab: Dog 4 1 Pig 4 0 Test Data Precision Recall Confidence Interval Lizard 4 0 ... ... ... Animal Legs Furry Cat 4 1 Pig 4 0 Parrot 2 0 ... ... ...
Data design Model design Output design Data Model Evaluating Data Cleaning Model Output Model Building
Model Output What will the output of your model look like? Key vocab: Confidence Interval Bayesian
Animal Legs Furry Cat 4 1 Pig 4 0 Parrot 2 0 ... ... ... Data Data Cleaning Model Evaluating Model Building Select an algorithm based on the problem you re trying to solve Does this predict well? Animal Legs Furry Sound Animal Legs Furry Cat 4 Yes Meow Cat 4 1 Supervised vs unsupervised Dog 4 Yes Woof Dog 4 1 Pig 4 No Oink Pig 4 0 Lizard 4 No N/A Lizard 4 0 ... ... ... ... ... ... ...
Animal Legs Furry Cat 4 1 Pig 4 0 Parrot 2 0 ... ... ... Data Data Cleaning Model Evaluating Model Building Select an algorithm based on the problem you re trying to solve Does this predict well? Animal Legs Furry Sound Animal Legs Furry Cat 4 Yes Meow Cat 4 1 Supervised vs unsupervised Dog 4 Yes Woof Dog 4 1 Pig 4 No Oink Pig 4 0 Lizard 4 No N/A Lizard 4 0 ... ... ... ... ... ... ...
Data All machine learning models need data Where does your data come from? What is the type() of each of the variables (columns) Key vocab: Animal Legs Furry Sound Cat 4 Yes Meow Training data - the data you use to train your ML model Data type - the type/format of your data (string/integer) Dog 4 Yes Woof Pig 4 No Oink Lizard 4 No N/A ... ... ... ...
Animal Legs Furry Sound Data - try for yourself! Cat 4 Yes Meow Dog 4 Yes Woof Open Python (premade workbook - just run code) Pig 4 No Oink Lizard 4 No N/A ... ... ... ...
Animal Legs Furry Cat 4 1 Pig 4 0 Parrot 2 0 ... ... ... Data Data Cleaning Model Evaluating Model Building Select an algorithm based on the problem you re trying to solve Does this predict well? Animal Legs Furry Sound Animal Legs Furry Cat 4 Yes Meow Cat 4 1 Supervised vs unsupervised Dog 4 Yes Woof Dog 4 1 Pig 4 No Oink Pig 4 0 Lizard 4 No N/A Lizard 4 0 ... ... ... ... ... ... ...
Animal Legs Furry Sound Data Cleaning Cat 4 Yes Meow Dog 4 Yes Woof Format the data in a way that the computer can read it Might choose to exclude missing values Explore your data - look for trends that might inform you Remember - how was your data collected? How is it going to be used? Pig 4 No Oink Lizard 4 No N/A ... ... ... ... Animal Legs Furry Key vocab: Cat 4 1 Dog 4 1 Normalizing Remove NA Pig 4 0 Lizard 4 0 ... ... ...
Animal Legs Furry Sound Data Cleaning - try for yourself! Cat 4 Yes Meow Dog 4 Yes Woof Premade python notebook Pig 4 No Oink Lizard 4 No N/A ... ... ... ... Animal Legs Furry Cat 4 1 Dog 4 1 Pig 4 0 Lizard 4 0 ... ... ...
Animal Legs Furry Cat 4 1 Pig 4 0 Parrot 2 0 ... ... ... Data Data Cleaning Model Evaluating Model Building Select an algorithm based on the problem you re trying to solve Does this predict well? Animal Legs Furry Sound Animal Legs Furry Cat 4 Yes Meow Cat 4 1 Supervised vs unsupervised Dog 4 Yes Woof Dog 4 1 Pig 4 No Oink Pig 4 0 Lizard 4 No N/A Lizard 4 0 ... ... ... ... ... ... ...
Model Building Ask yourself: What type of problem are you trying to solve? Data + Algorithm = model Algorithm: Clustering, Regression, Decision Tree, etc. Key vocab: Supervised vs unsupervised learning Supervised - knowing what the data should be, categorizing Unsupervised - letting the ML find patterns for you Algorithm
Animal Legs Furry Sound What can Machine Learning do? Cat 4 Yes Meow Dog 4 Yes Woof Pig 4 No Oink Classification of new data Dog or cat? Find trends and patterns (regression, clustering) Lizard 4 No N/A ... ... ... ... Animal Legs Furry What can t ML do? Cat 4 1 Dog 4 1 Clean your data! Identify patterns that ARE NOT in the data Pig 4 0 Lizard 4 0 ... ... ...
Animal Legs Furry Cat 4 1 Pig 4 0 Parrot 2 0 ... ... ... Data Data Cleaning Model Evaluating Model Building Select an algorithm based on the problem you re trying to solve Does this predict well? Animal Legs Furry Sound Animal Legs Furry Cat 4 Yes Meow Cat 4 1 Dog 4 Yes Woof Dog 4 1 Pig 4 No Oink Pig 4 0 Lizard 4 No N/A Lizard 4 0 ... ... ... ... ... ... ...
Animal Legs Furry Model Evaluating Cat 4 1 Dog 4 1 Pig 4 0 How well can your model [predict] unseen data? Lizard 4 0 ... ... ... Key vocab: Test Data Precision Recall Confidence Interval Animal Legs Furry Cat 4 1 Pig 4 0 Parrot 2 0 ... ... ...