Model Bias and Optimization in Machine Learning

General Guidance
Hung-yi Lee 
李宏毅
Framework of ML
Training data:
Testing data:
 
(speaker)
 
痛みを知れ
 
了解痛苦吧
 
Speech Recognition
 
Image Recognition
 
Speaker Recognition
 
Machine Translation
Framework of ML
 
Training:
 
Upload to Kaggle
loss on training data
large
small
model
bias
optimization
make your
model complex
Next Lecture
loss on testing data
overfitting
mismatch
small
large
Not in HWs,
except HW 11
make your model simpler
more training data (not in HWs)
data augmentation
trade-off
Split your training data into training set and
validation set for model selection
General
Guide
The model is too simple.
Solution: redesign your model to make it more
flexible
 
More features
 
Deep Learning
(more neurons, layers)
 
too small …
Model Bias
 
small loss
 
find a needle in a haystack …
 
… but there is no needle
loss on training data
large
small
model
bias
optimization
make your
model complex
Next Lecture
loss on testing data
overfitting
mismatch
small
large
trade-off
Split your training data into training set and
validation set for model selection
General
Guide
Not in HWs,
except HW 11
make your model simpler
more training data (not in HWs)
data augmentation
Optimization Issue
Large loss not always imply model bias. There is
another possibility …
large
 
A
 needle is in a haystack …
 
… Just cannot find it.
A
 needle is in a haystack …
… Just cannot find it.
too small …
small loss
find a needle in a haystack …
… but there is no needle
Optimization Issue
Model Bias
Which one???
Model Bias v.s. Optimization Issue
Gaining the insights from comparison 
 
Testing Data
Overfitting?
 
Training Data
Optimization issue
Ref:
http://arxiv.org/abs/1512.0338
5
Optimization Issue
 
Gaining the insights from comparison
Start from shallower networks (or other models),
which are easier to optimize.
If deeper networks do not obtain smaller loss on
training data
,  then there is optimization issue.
 
 
 
Solution: More powerful optimization technology
(next lecture)
Ref:
http://arxiv.org/abs/1512.0338
5
loss on training data
large
small
model
bias
optimization
make your
model complex
Next Lecture
loss on testing data
overfitting
mismatch
small
large
trade-off
Split your training data into training set and
validation set for model selection
General
Guide
Not in HWs,
except HW 11
make your model simpler
more training data (not in HWs)
data augmentation
loss on training data
large
small
model
bias
optimization
make your
model complex
Next Lecture
loss on testing data
overfitting
mismatch
small
large
trade-off
Split your training data into training set and
validation set for model selection
General
Guide
Not in HWs,
except HW 11
make your model simpler
more training data (not in HWs)
data augmentation
Overfitting
Small loss on training data, large loss on testing
data. Why?
 
An extreme example
 
Training data:
 
This function obtains 
zero training loss
, but 
large testing loss
.
 
Less than useless …
Overfitting
 
“freestyle”
 
Real data distribution
(not observable)
 
Training data
 
Testing data
 
Flexible
model
 
Large loss
Overfitting
Flexible
model
More training data
 
Data augmentation
(
cannot
 do it in HWs)
 
(you can do that in HWs)
Overfitting
Real data distribution
(not observable)
Training data
Testing data
 
constrained
model
Overfitting
Real data distribution
(not observable)
Training data
Testing data
constrained
model
Overfitting
 
Less features
Early stopping
Regularization
Dropout
 
Less parameters, sharing parameters
CNN
 
Fully-connected
Overfitting
Real data distribution
(not observable)
Training data
Testing data
constrain
too much
 
Back to model bias …
Bias-Complexity Trade-off
loss
Model becomes complex
(e.g. more features, more parameters)
 
Training loss
 
Testing loss
 
select this one
Training Set
Testing Set
Testing Set
 
public
 
private
 
Model 1
 
Model 2
 
Model 3
 
mse = 0.9
 
mse = 0.7
 
mse = 0.5
 
mse > 0.5
Homework
 
Pick this one!
 
May be poor …
 
The extreme example again
 
Random on private testing set
 
http://www.chioka.in/how-
to-select-your-final-models-
in-a-kaggle-competitio/
 
What will happen?
This explains why machine usually beats
human on benchmark corpora. 
 
Why?
Cross Validation
Training Set
Testing Set
Testing Set
public
private
Training
Set
Validation
 set
 
Model 1
 
Model 2
 
Model 3
 
mse 
= 0.9
 
mse 
= 0.7
 
mse 
= 0.5
 
mse 
> 0.5
 
mse 
> 0.5
 
Using the results of public testing
data to 
select
 your model
 
You are making public set
better than private set.
 
Not recommend
 
How to split?
N-fold Cross Validation
Training Set
Train
Train
Val
Train
Val
Train
Val
Train
Train
 
Model 1
 
Model 2
 
Model 3
 
mse 
= 0.4
 
mse 
= 0.5
 
mse 
= 0.3
 
mse = 
0.4
 
mse = 
0.5
 
mse 
= 0.6
 
mse 
= 0.2
 
mse 
= 0.4
 
mse 
= 0.3
 
Avg 
mse
= 0.4
 
Avg 
mse
= 0.5
 
Avg 
mse
= 0.3
Testing Set
Testing Set
 
public
 
private
loss on training data
large
small
model
bias
optimization
make your
model complex
Next Lecture
loss on testing data
overfitting
mismatch
small
large
Not in HWs,
except HW 11
make your model simpler
more training data (not in HWs)
data augmentation
trade-off
Split your training data into training set and
validation set for model selection
General
Guide
Let’s predict no. of views of 2/26!
 
e = 2.58k
 
Red
: real, 
Blue
: predicted
 
2/26
loss on training data
large
small
model
bias
optimization
make your
model complex
Next Lecture
loss on testing data
overfitting
mismatch
small
large
Not in HWs,
except HW 11
make your model simpler
more training data (not in HWs)
data augmentation
trade-off
Split your training data into training set and
validation set for model selection
General
Guide
Mismatch
Your training and testing data have different
distributions.
 
Training Data
Training Data
 
Testing Data
Testing Data
 
Simply increasing the training data will not help.
 
Most HWs do not have this problem, except HW11
Most HWs do not have this problem, except HW11
 
Be aware of how data is generated.
loss on training data
large
small
model
bias
optimization
make your
model complex
Next Lecture
loss on testing data
overfitting
mismatch
small
large
Not in HWs,
except HW 11
make your model simpler
more training data (not in HWs)
data augmentation
trade-off
Split your training data into training set and
validation set for model selection
General
Guide
Slide Note

TBD:

引用程式等等

不能用額外資料

=====

How is the results on 2/26?

======

Outline:

1. Model bias

Agnosis:

Solution:

2. Optimization Error

Agnosis:

Solution:

3. Variance?

Agnosis: model big, data little

Solution: collect data, augmentation

Model selection!

Constrain your model

====

Variance and bias trade-off – select your model carefully

====

4. Mismatch

Agnoisis: (it does not work even with more data)

Solution:

====================

-- Mention the issue of variance

-- Bias and variance trade off

lots of different things may influence them

-- You can’t have your cake and eat it (too)?

Embed
Share

Learn about the concepts of model bias, loss on training data, and optimization issues in the context of machine learning. Discover strategies to address model bias, deal with large or small losses, and optimize models effectively to improve performance and accuracy. Gain insights into splitting training data, overfitting, model complexity, and more for robust model selection.

  • Machine Learning
  • Model Bias
  • Optimization Issues
  • Training Data
  • Overfitting

Uploaded on Oct 10, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. General Guidance Hung-yi Lee

  2. Framework of ML ??, ?1, ??, ?2, , ??, ?? Training data: ??+?,??+?, ,??+? Testing data: Image Recognition Speech Recognition ?: phoneme ?: ?: ?: soup Speaker Recognition Machine Translation ?: ?: John (speaker) ?: ?:

  3. Framework of ML ??, ?1, ??, ?2, , ??, ?? Training data: Training: Step 1: function with unknown Step 2: define loss from training data Step 3: optimization ? = ???min ? ? = ??? ? ? ? ??+?,??+?, ,??+? Testing data: Use ? = ?? ? to label the testing data ??+1,??+2, ,??+? Upload to Kaggle

  4. loss on training data General Guide large small model bias loss on testing data optimization small large Next Lecture make your model complex overfitting mismatch Not in HWs, except HW 11 more training data (not in HWs) data augmentation make your model simpler trade-off Split your training data into training set and validation set for model selection

  5. Model Bias The model is too simple. ? = ??? ??? ? ??? ? find a needle in a haystack ?? ? but there is no needle ? ? small loss too small Solution: redesign your model to make it more flexible ? = ? + ??1 56 More features ? = ? + ???? ?=1 Deep Learning (more neurons, layers) ? = ? + ?? ??????? ??+ ????? ? ?

  6. loss on training data General Guide large small model bias loss on testing data optimization small large Next Lecture make your model complex overfitting mismatch Not in HWs, except HW 11 more training data (not in HWs) data augmentation make your model simpler trade-off Split your training data into training set and validation set for model selection

  7. Optimization Issue Large loss not always imply model bias. There is another possibility ??? ? ?? ? ??? ? ? ? = ??? ? ? ? ? large A needle is in a haystack ? ? Just cannot find it.

  8. ??? ? Model Bias ??? ? find a needle in a haystack ?? ? but there is no needle ? ? small loss too small Which one??? ??? ? ?? ? ??? ? Optimization Issue ? = ??? A needle is in a haystack Just cannot find it. ? ?

  9. Ref: http://arxiv.org/abs/1512.0338 5 Model Bias v.s. Optimization Issue Gaining the insights from comparison Optimization issue Overfitting? Testing Data Training Data

  10. Ref: http://arxiv.org/abs/1512.0338 5 Optimization Issue Gaining the insights from comparison Start from shallower networks (or other models), which are easier to optimize. If deeper networks do not obtain smaller loss on training data, then there is optimization issue. 1 layer 2 layer 3 layer 4 layer 5 layer 2017 2020 0.28k 0.18k 0.14k 0.10k 0.34k Solution: More powerful optimization technology (next lecture)

  11. loss on training data General Guide large small model bias loss on testing data optimization small large Next Lecture make your model complex overfitting mismatch Not in HWs, except HW 11 more training data (not in HWs) data augmentation make your model simpler trade-off Split your training data into training set and validation set for model selection

  12. loss on training data General Guide large small model bias loss on testing data optimization small large Next Lecture make your model complex overfitting mismatch Not in HWs, except HW 11 more training data (not in HWs) data augmentation make your model simpler trade-off Split your training data into training set and validation set for model selection

  13. Overfitting Small loss on training data, large loss on testing data. Why? An extreme example ??, ?1, ??, ?2, , ??, ?? Training data: ??= ? ?? ?????? ?? Less than useless ? ? = ?????? This function obtains zero training loss, but large testing loss.

  14. Overfitting freestyle ? ? Flexible model ? ? ? Large loss Real data distribution (not observable) Training data Testing data ?

  15. Overfitting ? ? Flexible model ? More training data (cannot do it in HWs) ? (you can do that in HWs) Data augmentation

  16. Overfitting ? ? = ? + ?? + ??2 ? constrained model ? ? Real data distribution (not observable) Training data Testing data

  17. Overfitting ? ? = ? + ?? + ??2 ? constrained model ? ? ? Real data distribution (not observable) Training data Testing data ?

  18. Overfitting ? ? = ? + ?? + ??2 ? constrained model ? ? Less parameters, sharing parameters Less features Early stopping Regularization Dropout Fully-connected CNN

  19. Overfitting ? ? = ? + ?? ? constrain too much ? ? ? Back to model bias Real data distribution (not observable) Training data Testing data ?

  20. Bias-Complexity Trade-off loss Testing loss select this one Training loss Model becomes complex (e.g. more features, more parameters)

  21. Homework public private Training Set Testing Set Testing Set Model 1 mse = 0.9 Model 2 mse = 0.7 Model 3 mse > 0.5 May be poor mse = 0.5 Pick this one! The extreme example again ??= ? ?? ?????? ?? ??? = ?: 1 - 10000000000000000000 ?????? It is possible that ?56789?happens to get good performance on public testing set. So you select ?56789? Random on private testing set

  22. Homework public private Training Set Testing Set Testing Set Why? Model 1 mse = 0.9 Model 2 mse = 0.7 Model 3 mse > 0.5 May be poor mse = 0.5 Pick this one! What will happen? http://www.chioka.in/how- to-select-your-final-models- in-a-kaggle-competitio/ This explains why machine usually beats human on benchmark corpora.

  23. Cross Validation How to split? public private Testing Set Training Set Testing Set Training Set Using the results of public testing data to select your model You are making public set better than private set. Validation set Model 1 mse = 0.9 Not recommend Model 2 mse = 0.7 mse > 0.5 Model 3 mse = 0.5 mse > 0.5

  24. N-fold Cross Validation Training Set Model 2 Model 3 Model 1 Train Train Val mse = 0.2 mse = 0.4 mse = 0.4 Train Val Train mse = 0.4 mse = 0.5 mse = 0.5 Val Train Train mse = 0.3 mse = 0.6 mse = 0.3 Avg mse = 0.3 Avg mse = 0.5 Avg mse = 0.4 Testing Set Testing Set public private

  25. loss on training data General Guide large small model bias loss on testing data optimization small large Next Lecture make your model complex overfitting mismatch Not in HWs, except HW 11 more training data (not in HWs) data augmentation make your model simpler trade-off Split your training data into training set and validation set for model selection

  26. Lets predict no. of views of 2/26! 1 layer 2 layer 3 layer 4 layer 2017 2020 0.28k Red: real, Blue: predicted 0.18k 0.14k 0.10k 2/26 2021 0.43k 0.39k 0.38k 0.44k e = 2.58k

  27. loss on training data General Guide large small model bias loss on testing data optimization small large Next Lecture make your model complex overfitting mismatch Not in HWs, except HW 11 more training data (not in HWs) data augmentation make your model simpler trade-off Split your training data into training set and validation set for model selection

  28. Mismatch Your training and testing data have different distributions. Most HWs do not have this problem, except HW11 Be aware of how data is generated. Training Data Simply increasing the training data will not help. Testing Data

  29. loss on training data General Guide large small model bias loss on testing data optimization small large Next Lecture make your model complex overfitting mismatch Not in HWs, except HW 11 more training data (not in HWs) data augmentation make your model simpler trade-off Split your training data into training set and validation set for model selection

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#