Introduction to Machine Learning: Model Selection and Error Decomposition

ECE 5984: Introduction to

Machine Learning

Dhruv Batra

Virginia Tech

Topics:

–

(Finish) Model selection

–

Error decomposition

–

Bias-Variance Tradeoff

–

Classification: Na

ï

ve Bayes

Readings: Barber 17.1, 17.2, 10.1-10.3

Administrativia

•

HW2

–

Due: Friday 03/06, 11:55pm

–

Implement linear regression, Naïve Bayes, Logistic

Regression

•

Need a couple of catch-up lectures

–

How about 4-6pm?

(C) Dhruv Batra

Administrativia

•

Mid-term

–

When: March 18, class timing

–

Where: In class

–

Format: Pen-and-paper.

–

Open-book, open-notes, closed-internet.

•

No sharing.

–

What to expect: mix of

•

Multiple Choice or True/False questions

•

“Prove this statement”

•

“What would happen for this dataset?”

–

Material

•

Everything from beginning to class to (including) SVMs

(C) Dhruv Batra

Recap of last time

(C) Dhruv Batra

Regression

(C) Dhruv Batra

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

What you need to know

•

Linear Regression

–

Model

–

Least Squares Objective

–

Connections to Max Likelihood with Gaussian Conditional

–

Robust regression with Laplacian Likelihood

–

Ridge Regression with priors

–

Polynomial and General Additive Regression

(C) Dhruv Batra

Plan for Today

•

(Finish) Model Selection

–

Overfitting vs Underfitting

–

Bias-Variance trade-off

•

aka Modeling error vs Estimation error tradeoff

•

Naïve Bayes

(C) Dhruv Batra

New Topic: Model Selection and

Error Decomposition

(C) Dhruv Batra

Example for Regression

•

Demo

–

http://www.princeton.edu/~rkatzwer/PolynomialRegression/

•

How do we pick the hypothesis class?

(C) Dhruv Batra

Model Selection

•

How do we pick the right model class?

•

Similar questions

–

How do I pick magic hyper-parameters?

–

How do I do feature selection?

(C) Dhruv Batra

Errors

•

Expected Loss/Error

•

Training Loss/Error

•

Validation Loss/Error

•

Test Loss/Error

•

Reporting Training Error (instead of Test) is

CHEATING

•

Optimizing parameters on Test Error is CHEATING

(C) Dhruv Batra

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

Typical Behavior

(C) Dhruv Batra

•

Overfitting

•

’

(C) Dhruv Batra

Slide Credit: Carlos Guestrin

Error Decomposition

(C) Dhruv Batra

Error Decomposition

(C) Dhruv Batra

Modeling Error

model class

Error Decomposition

(C) Dhruv Batra

model class

Higher-Order

Potentials

Error Decomposition

•

Approximation/Modeling Error

–

You approximated reality with model

•

Estimation Error

–

You tried to learn model with finite data

•

Optimization Error

–

You were lazy and couldn’t/didn’t optimize to completion

•

(Next time) Bayes Error

–

Reality just sucks

(C) Dhruv Batra

Bias-Variance Tradeoff

•

–

Measures how well you expect to represent true solution

–

Decreases with more complex model

•

–

Measures how sensitive learner is to specific dataset

–

Increases with more complex model

(C) Dhruv Batra

Slide Credit: Carlos Guestrin

Bias-Variance Tradeoff

•

Matlab demo

(C) Dhruv Batra

Bias-Variance Tradeoff

•

Choice of hypothesis class introduces learning bias

–

More complex class

→

 less bias

–

More complex class

→

 more variance

(C) Dhruv Batra

Slide Credit: Carlos Guestrin

(C) Dhruv Batra

Slide Credit: Greg Shakhnarovich

Learning Curves

•

Error vs size of dataset

•

On board

–

High-bias curves

–

High-variance curves

(C) Dhruv Batra

Debugging Machine Learning

•

My algorithm does work

–

High test error

•

What should I do?

–

More training data

–

Smaller set of features

–

Larger set of features

–

Lower regularization

–

Higher regularization

(C) Dhruv Batra

What you need to know

•

Generalization Error Decomposition

–

Approximation, estimation, optimization, bayes error

–

For squared losses, bias-variance tradeoff

•

Errors

–

Difference between train & test error & expected error

–

Cross-validation (and cross-val error)

–

NEVER EVER learn on test data

•

Overfitting vs Underfitting

(C) Dhruv Batra

New Topic:

Na

ï

ve Bayes

(your first probabilistic classifier)

(C) Dhruv Batra

Classification

Discrete

Classification

•



–

–

–

Y – target classes

•

–

Bayes classifier:

•

Slide Credit: Carlos Guestrin

Optimal classification

•

–

That is

•

Slide Credit: Carlos Guestrin

Generative vs. Discriminative

•

Generative Approach

–

Estimate p(x|y) and p(y)

–

Use Bayes Rule to predict y

•

Discriminative Approach

–

Estimate p(y|x) directly OR

–

Learn “discriminant” function h(x)

(C) Dhruv Batra

Generative vs. Discriminative

•

Generative Approach

–

Assume some functional form for P(X|Y), P(Y)

–

Estimate p(X|Y) and p(Y)

–

Use Bayes Rule to calculate P(Y| X=x)

–

–



•

Discriminative Approach

–

Estimate p(y|x) directly OR

–

Learn “discriminant” function h(x)

–

Direct but cannot obtain a sample of the data, because P(X)

is not available

(C) Dhruv Batra

Generative vs. Discriminative

•

Generative:

–

Today: Na

ï

ve Bayes

•

Discriminative:

–

Next: Logistic Regression

•

NB & LR related to each other.

(C) Dhruv Batra

How hard is it to learn the optimal classifier?

Slide Credit: Carlos Guestrin

•

Categorical Data

•

How do we represent these? How many parameters?

–

Class-Prior, P(Y):

•

Suppose Y is composed of

 classes

–

•

•

Complex model



 High variance with limited data!!!

Independence to the rescue

(C) Dhruv Batra

Slide Credit: Sam Roweis

The Naïve Bayes assumption

•

Naïve Bayes assumption:

–

Features are independent given class:

–

More generally:

•

How many parameters now?

•

Slide Credit: Carlos Guestrin

(C) Dhruv Batra

The Naïve Bayes Classifier

Slide Credit: Carlos Guestrin

(C) Dhruv Batra

•

Given:

–

Class-Prior P(Y)

–

–

For each X

, we have likelihood P(X

|Y)

•

Decision rule:

•

If assumption holds, NB is optimal classifier!

Slide Note

Embed Share

Download

This course covers topics such as model selection, error decomposition, bias-variance tradeoff, and classification using Naive Bayes. Students are required to implement linear regression, Naive Bayes, and logistic regression for homework. Important administrative information about deadlines, mid-term exams, and review sessions is provided. The course also explores regression techniques, linear regression models, least squares objectives, and connections to maximum likelihood. The focus is on understanding overfitting, underfitting, bias-variance trade-offs, and the importance of model selection in machine learning.

toma_les Follow

Uploaded on Oct 10, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

ECE 5984: Introduction to Machine Learning Topics: (Finish) Model selection Error decomposition Bias-Variance Tradeoff Classification: Na ve Bayes Readings: Barber 17.1, 17.2, 10.1-10.3 Dhruv Batra Virginia Tech

Administrativia HW2 Due: Friday 03/06, 11:55pm Implement linear regression, Na ve Bayes, Logistic Regression Need a couple of catch-up lectures How about 4-6pm? (C) Dhruv Batra 2

Administrativia Mid-term When: March 18, class timing Where: In class Format: Pen-and-paper. Open-book, open-notes, closed-internet. No sharing. What to expect: mix of Multiple Choice or True/False questions Prove this statement What would happen for this dataset? Material Everything from beginning to class to (including) SVMs (C) Dhruv Batra 3

Recap of last time (C) Dhruv Batra 4

Regression (C) Dhruv Batra 5

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 6

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 7

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 8

What you need to know Linear Regression Model Least Squares Objective Connections to Max Likelihood with Gaussian Conditional Robust regression with Laplacian Likelihood Ridge Regression with priors Polynomial and General Additive Regression (C) Dhruv Batra 9

Plan for Today (Finish) Model Selection Overfitting vs Underfitting Bias-Variance trade-off aka Modeling error vs Estimation error tradeoff Na ve Bayes (C) Dhruv Batra 10

New Topic: Model Selection and Error Decomposition (C) Dhruv Batra 11

Example for Regression Demo http://www.princeton.edu/~rkatzwer/PolynomialRegression/ How do we pick the hypothesis class? (C) Dhruv Batra 12

Model Selection How do we pick the right model class? Similar questions How do I pick magic hyper-parameters? How do I do feature selection? (C) Dhruv Batra 13

Errors Expected Loss/Error Training Loss/Error Validation Loss/Error Test Loss/Error Reporting Training Error (instead of Test) is CHEATING Optimizing parameters on Test Error is CHEATING (C) Dhruv Batra 14

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 15

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 16

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 17

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 18

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 19

Typical Behavior a (C) Dhruv Batra 20

Overfitting Overfitting: a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w such that: (C) Dhruv Batra Slide Credit: Carlos Guestrin 21

Error Decomposition Reality (C) Dhruv Batra 22

Error Decomposition Reality (C) Dhruv Batra 23

Error Decomposition Reality Higher-Order Potentials (C) Dhruv Batra 24

Error Decomposition Approximation/Modeling Error You approximated reality with model Estimation Error You tried to learn model with finite data Optimization Error You were lazy and couldn t/didn t optimize to completion (Next time) Bayes Error Reality just sucks (C) Dhruv Batra 25

Bias-Variance Tradeoff Bias: difference between what you expect to learn and truth Measures how well you expect to represent true solution Decreases with more complex model Variance: difference between what you expect to learn and what you learn from a from a particular dataset Measures how sensitive learner is to specific dataset Increases with more complex model (C) Dhruv Batra Slide Credit: Carlos Guestrin 26

Bias-Variance Tradeoff Matlab demo (C) Dhruv Batra 27

Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More complex class more variance (C) Dhruv Batra Slide Credit: Carlos Guestrin 28

(C) Dhruv Batra Slide Credit: Greg Shakhnarovich 29

Learning Curves Error vs size of dataset On board High-bias curves High-variance curves (C) Dhruv Batra 30

Debugging Machine Learning My algorithm does work High test error What should I do? More training data Smaller set of features Larger set of features Lower regularization Higher regularization (C) Dhruv Batra 31

What you need to know Generalization Error Decomposition Approximation, estimation, optimization, bayes error For squared losses, bias-variance tradeoff Errors Difference between train & test error & expected error Cross-validation (and cross-val error) NEVER EVER learn on test data Overfitting vs Underfitting (C) Dhruv Batra 32

New Topic: Na ve Bayes (your first probabilistic classifier) x Classification y Discrete (C) Dhruv Batra 33

Classification Learn: h:X X features Y target classes Y Suppose you know P(Y|X) exactly, how should you classify? Bayes classifier: Why? Slide Credit: Carlos Guestrin

Optimal classification Theorem: Bayes classifier hBayes is optimal! That is Proof: Slide Credit: Carlos Guestrin

Generative vs. Discriminative Generative Approach Estimate p(x|y) and p(y) Use Bayes Rule to predict y Discriminative Approach Estimate p(y|x) directly OR Learn discriminant function h(x) (C) Dhruv Batra 36

Generative vs. Discriminative Generative Approach Assume some functional form for P(X|Y), P(Y) Estimate p(X|Y) and p(Y) Use Bayes Rule to calculate P(Y| X=x) Indirectcomputation of P(Y|X) through Bayes rule But, can generate a sample, P(X) = y P(y) P(X|y) Discriminative Approach Estimate p(y|x) directly OR Learn discriminant function h(x) Direct but cannot obtain a sample of the data, because P(X) is not available (C) Dhruv Batra 37

Generative vs. Discriminative Generative: Today: Na ve Bayes Discriminative: Next: Logistic Regression NB & LR related to each other. (C) Dhruv Batra 38

How hard is it to learn the optimal classifier? Categorical Data How do we represent these? How many parameters? Class-Prior, P(Y): Suppose Y is composed of k classes Likelihood, P(X|Y): Suppose X is composed of d binary features Complex model High variance with limited data!!! Slide Credit: Carlos Guestrin

Independence to the rescue (C) Dhruv Batra Slide Credit: Sam Roweis 40

The Nave Bayes assumption Na ve Bayes assumption: Features are independent given class: More generally: How many parameters now? Suppose X is composed of d binary features (C) Dhruv Batra Slide Credit: Carlos Guestrin 41

The Nave Bayes Classifier Given: Class-Prior P(Y) d conditionally independent features X given the class Y For each Xi, we have likelihood P(Xi|Y) Decision rule: If assumption holds, NB is optimal classifier! (C) Dhruv Batra Slide Credit: Carlos Guestrin 42

Introduction to Machine Learning: Model Selection and Error Decomposition

Download Presentation

Presentation Transcript

Related

More Related Content