Online Learning in Machine Learning

Online Learning: An

Introduction

Travis Mandel

University of Washington

Overview of Machine Learning

•

Supervised Learning

•

Decision Trees, Naïve Bayes,

Linear Regression, K-NN, etc.

•

Unsupervised Learning

•

K-means, Association Rule

Learning, etc.

What else is there????

Overview of Machine Learning

•

Supervised Learning

•

Unsupervised Learning

Is this the best way to learn?

Labeled Training Data

ML

Algorithm

Classifier/

Regression

function

Unlabeled Training Data

ML

Algorithm

Clusters/Rules

Overview of Machine Learning

•

Supervised Learning

•

Unsupervised Learning

•

Semi-Supervised Learning

•

Active Learning

•

Interactive Learning

•

Online Learning

•

Reinforcement Learning

Online Classification

Observe  One Training

Instance

ML

Algorithm

Issues

prediction

Observes Label,

Receives

Loss/Reward

More generally… Online Learning!

Observe observations

ML

Algorithm

Selects

output

Observe

Loss/Reward

Function

Receives

Loss/Reward

Typical Assumption: Statistical Feedback

•

Instances and labels drawn from a fixed

distribution

Recall: Distribution

Alternate Assumption: Adversarial

Feedback

•

Instances and labels drawn adversarially

•

Spam detection, anomaly detection,

etc.

•

Change over time a HUGE Issue!

Change over time example

How can we measure

performance?

•

What we want to maximize: Sum of

rewards

•

Why could this be a problem?

Alternative: Discounted Sum of

Rewards

One problem…

If a small percent of times you never find the best option

(Incomplete learning), it doesn’t matter much!

But intuitively it should matter a LOT!

Regret

•

Sum of rewards compared to the

BEST choice in hindsight!

•

If training a linear model, optimal

weights with overfitting!

•

See other examples in a second

•

Can get good regret even if perform

very poorly in absolute terms

We really like theoretical

guarantees!

•

Anyone know why?

•

Running online

 uncertain future

•

Hard to tell if you reached best

possible performance by looking at

data

Recall…

Observe observations

ML

Algorithm

Selects

output

Observe

Loss/Reward

Function

Receives

Loss/Reward

Experts Setting

Observe  expert

recommendations

ML

Algorithm

Selects

Expert

…

N experts

Assume

rewards in [0,1]

Observe

Loss/Reward

For each expert

Receives

Loss/Reward

Regret in the Experts Setting

•

Do as well as the best expert in

hindsight!

•

Hopefully you have (at least some)

good experts!

“Choose an Expert?”

•

Each expert could make a

prediction (different ML models,

classifiers, hypotheses, etc.)

•

Each expert could tell us to perform

an action (such as buy a stock)

•

Experts can learn & change over

time!

Simple strategies

•

Pick Expert that’s done the best in

the past?

•

Pick Expert that’s done the best

recently?

No! Adversarial!

•

For ANY deterministic strategy…

•

The adversary can know which

expert we will choose, and give it

low reward/high loss, but all others

high reward/low loss!

•

Every step, get further away from

best expert

 Linear regret 

Solution: Randomness

•

What if the algorithm has access to

some random bits?

•

We can do better!

EG: Exponentiated Gradient

Probabilities are normalized weights

EG: Regret Bound

Problem: What if the feedback is

a little different?

•

In the real world, we usually don’t

get to see what “would have”

happened

•

Shift from predicting future to

taking actions in an unknown

world

•

Simplest Reinforcement Learning

problem

Multi-Armed Bandit (MAB)

Problem

…

N arms

Assume

rewards in [0,1]

Assume

Statistical feedback

Observes  Loss/Reward

For The Chosen Arm

ML

Algorithm

Pulls Arm

Suffers

Loss/Reward

Regret in the MAB Setting

•

Since we are stochastic, each arm

has a true mean

•

Do as well as if pulled always

pulled arm with highest mean

MABs have a huge number of

applications!

•

Internet Advertising

•

Website Layout

•

Clinical Trials

•

Robotics

•

Education

•

Videogames

•

Recommender Systems

•

…

Unlike most of Machine Learning, how to ACT,

not just how to predict!

A surprisingly tough problem…

“The problem is a classic one; it was

formulated during the war, and efforts to

solve it so sapped the energies and minds of

Allied analysts that the suggestion was made

that the problem be dropped over Germany,

as the ultimate instrument of intellectual

sabotage. “

Prof. Peter Whittle,

Journal of the Royal

Statistical Society

MAB algorithm #1:

ε

-first

•

Phase 1: Explore arms uniformly at

random

•

Phase 2: Exploit best arm so far

•

Only 1 parameter: how long to

wait!

•

Problem?

•

Not very efficient

•

Makes statistics easy to run

MAB algorithm #2:

ε

-greedy

•

Flip a coin:

•

With probability

ε

, explore

uniformly

•

With probability 1-

ε

, exploit

•

Only 1 parameter:

ε

•

Problem?

•

Often used in robotics and

reinforcement learning

Idea

•

The problem is uncertainty…

•

How to quantify uncertainty? Error

bars

Given Error bars, how do we act?

Given Error bars, how do we act?

•

Optimism under uncertainty!

•

Why?  If bad, we will soon find out!

One last wrinkle

Upper Confidence Bound (UCB)

1. Play each arm once

2. Play arm i that maximizes:

3. Repeat Step 2 forever

UCB Regret Bound

Problem –independent:

Problem –dependent:

Can’t do much better!

A little history…

William R. Thompson (1933): Was the first to examine MAB

problem, proposed a method for solving them

1940s-50s: MAB problem studied intentively during WWII,

Thompson ignored

1970’s-1980’s: “Optimal” solution (Gittins index) found but

is intractable and incomplete.  Thompson ignored.

2001: UCB proposed, gains widespread use due to simplicity

and “optimal”  bounds. Thompson still ignored.

2011: Empricial results show Thompson’s 1933 method beats

UCB, but little interest since no guarantees.

2013: Optimal bounds finally shown for Thompson Sampling

Thompson’s method was

fundamentally different!

Bayesian vs Frequentist

•

Bayesians: You have a prior,

probabilities interpreted as beliefs,

prefer probabilistic decisions

•

Frequentists: No prior, probabilities

interpreted as facts about the world,

prefer hard decisions (p<0.05)

UCB is a frequentist technique! What if we are Bayesian?

Bayesian review: Bayes’ Rule

Likelihood

Prior

Posterior

Bernoulli Case

What if distribution in the set {0,1}

instead of the range [0,1]?

Then we flip a coin with probability p

 Bernoulli

Distribution!

To estimate p, we count up numbers of ones and zeros

Given observed ones and zeroes, how do we calculate

 the  distibution of possible values of p?

Beta-Bernoulli Case

Beta(a,b)

  Given a 0’s and b 1’s, what is the

distribution over means?

Prior  pseudocounts

Likelihood  Observed counts

Posterior  psuedocounts + observed counts

How does this help us?

Thompson Sampling:

1. Specify prior (in Beta case often Beta(1,1))

2. Sample from each posterior distribution to get

estimated mean for each arm.

3. Pull arm with highest mean.

4. Repeat step 2 & 3 forever

Thompson Empirical Results

And shown to have optimal regret bounds just like

(and in some cases a little better than) UCB!

Problems with standard MAB

formulation

•

What happens if adversarial? (studied elsewhere)

•

What if arms are factored and share values?

•

What happens if there is delay?

•

What if we know a heuristic that we want to

incorporate without harming guarantees?

•

What if we have a prior dataset we want to use?

•

What if we want to evaluate algorithms/parameters

on previously collected data?

Problems with standard MAB

formulation

•

What happens if adversarial? (studied elsewhere)

•

What if arms are factored and share values?

•

What happens if there is delay?

•

What if we know a heuristic that we want to

incorporate without harming guarantees?

•

What if we have a prior dataset we want to use?

•

What if we want to evaluate algorithms/parameters

on previously collected data?

A look toward more complexity:

Reinforcement Learning

•

Sequence of actions instead of take

one and reset

•

Rich space of observations which

should influence our choice of

action

•

Much more powerful, but a much

harder problem!

Summary

•

Online Learning

•

Adversarial vs Statistical

Feedback

•

Regret

•

Experts Problem

•

EG algorithm

•

Multi-Armed bandits

•

e-greedy, e-first

•

UCB

•

Thompson Sampling

Questions?

Slide Note

Embed Share

Download

Explore the world of online learning in machine learning through topics like supervised learning, unsupervised learning, and more. Dive into concepts such as active learning, reinforcement learning, and the challenges of changing data distributions over time.

kintner_m Follow

Uploaded on Sep 25, 2024 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Online Learning: An Introduction Travis Mandel University of Washington {

Overview of Machine Learning Supervised Learning Decision Trees, Na ve Bayes, Linear Regression, K-NN, etc. Unsupervised Learning K-means, Association Rule Learning, etc. What else is there????

Overview of Machine Learning Supervised Learning Classifier/ Regression function ML Labeled Training Data Algorithm Unsupervised Learning ML Unlabeled Training Data Clusters/Rules Algorithm Is this the best way to learn?

Overview of Machine Learning Supervised Learning Unsupervised Learning Semi-Supervised Learning Active Learning Interactive Learning Online Learning Reinforcement Learning

Online Classification Observe One Training Instance ML Observes Label, Receives Loss/Reward Algorithm Issues prediction

More generally Online Learning! Observe observations Observe Loss/Reward Function Receives Loss/Reward ML Algorithm Selects output

Typical Assumption: Statistical Feedback Instances and labels drawn from a fixed distribution D

Recall: Distribution drug ? nigeria ? Spam? Prob N N N 0.5 N N Y 0.1 N Y N 0.05 N Y Y 0.1 Y N N 0.05 Y N Y 0.1 Y Y N 0.0 Y Y Y 0.1

Alternate Assumption: Adversarial Feedback Instances and labels drawn adversarially Spam detection, anomaly detection, etc. Change over time a HUGE Issue!

Change over time example

How can we measure performance? What we want to maximize: Sum of rewards Why could this be a problem?

Alternative: Discounted Sum of Rewards ?0+ ?1+ ?2+ ?3+ ?0+ ?1+ 2?2+ 3?3+ Guaranteed to be a finite sum if <1, r<c! One problem If a small percent of times you never find the best option (Incomplete learning), it doesn t matter much! But intuitively it should matter a LOT!

Regret Sum of rewards compared to the BEST choice in hindsight! ?? ?? ?=1 ?=1 If training a linear model, optimal weights with overfitting! See other examples in a second Can get good regret even if perform very poorly in absolute terms

We really like theoretical guarantees! Anyone know why? Running online uncertain future Hard to tell if you reached best possible performance by looking at data

Recall Observe observations Observe Loss/Reward Function Receives Loss/Reward ML Algorithm Selects output

Experts Setting N experts Assume rewards in [0,1] Observe expert recommendations Observe Loss/Reward For each expert Receives Loss/Reward ML Algorithm Selects Expert

Regret in the Experts Setting Do as well as the best expert in hindsight! Hopefully you have (at least some) good experts!

Choose an Expert? Each expert could make a prediction (different ML models, classifiers, hypotheses, etc.) Each expert could tell us to perform an action (such as buy a stock) Experts can learn & change over time!

Simple strategies Pick Expert that s done the best in the past? Pick Expert that s done the best recently?

No! Adversarial! For ANY deterministic strategy The adversary can know which expert we will choose, and give it low reward/high loss, but all others high reward/low loss! Every step, get further away from best expert Linear regret

Solution: Randomness What if the algorithm has access to some random bits? We can do better!

EG: Exponentiated Gradient ?1= 1, ,1 ??,?= ?? 1,? ??(??,? 1) 2 ???? ? ? = Probabilities are normalized weights

EG: Regret Bound ?( 2?log?)

Problem: What if the feedback is a little different? In the real world, we usually don t get to see what would have happened Shift from predicting future to taking actions in an unknown world Simplest Reinforcement Learning problem

Multi-Armed Bandit (MAB) Problem N arms Assume Statistical feedback Assume rewards in [0,1] Observes Loss/Reward For The Chosen Arm ML Algorithm Suffers Loss/Reward Pulls Arm

Regret in the MAB Setting Since we are stochastic, each arm has a true mean Do as well as if pulled always pulled arm with highest mean

MABs have a huge number of applications! Internet Advertising Website Layout Clinical Trials Robotics Education Videogames Recommender Systems Unlike most of Machine Learning, how to ACT, not just how to predict!

A surprisingly tough problem The problem is a classic one; it was formulated during the war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage. Prof. Peter Whittle, Journal of the Royal Statistical Society, 1979

MAB algorithm #1: -first Phase 1: Explore arms uniformly at random Phase 2: Exploit best arm so far Only 1 parameter: how long to wait! Problem? Not very efficient Makes statistics easy to run

MAB algorithm #2: -greedy Flip a coin: With probability , explore uniformly With probability 1- , exploit Only 1 parameter: Problem? Often used in robotics and reinforcement learning

Idea The problem is uncertainty How to quantify uncertainty? Error bars If arm has been sampled n times, With probability at least 1- ?: log(2 ?) ? ? < 2?

Given Error bars, how do we act?

Given Error bars, how do we act? Optimism under uncertainty! Why? If bad, we will soon find out!

One last wrinkle How to set confidence ? Decrease over time If arm has been sampled n times, With probability at least 1- ?: log(2 ?) ? ? < 2? ? =2 ?

Upper Confidence Bound (UCB) 1. Play each arm once 2. Play arm i that maximizes: 2log(?) ?? ??+ 3. Repeat Step 2 forever

UCB Regret Bound Problem independent: ?( ??????) Problem dependent: log? ? ?? ? min ? Can t do much better!

A little history William R. Thompson (1933): Was the first to examine MAB problem, proposed a method for solving them 1940s-50s: MAB problem studied intentively during WWII, Thompson ignored 1970 s-1980 s: Optimal solution (Gittins index) found but is intractable and incomplete. Thompson ignored. 2001: UCB proposed, gains widespread use due to simplicity and optimal bounds. Thompson still ignored. 2011: Empricial results show Thompson s 1933 method beats UCB, but little interest since no guarantees. 2013: Optimal bounds finally shown for Thompson Sampling

Thompsons method was fundamentally different!

Bayesian vs Frequentist Bayesians: You have a prior, probabilities interpreted as beliefs, prefer probabilistic decisions Frequentists: No prior, probabilities interpreted as facts about the world, prefer hard decisions (p<0.05) UCB is a frequentist technique! What if we are Bayesian?

Bayesian review: Bayes Rule ? ???? ? ?(?) ?(????) ? ? ????) = Posterior ? ? ????) ? ???? ? ?(?) Prior Likelihood

Bernoulli Case What if distribution in the set {0,1} instead of the range [0,1]? Then we flip a coin with probability p Bernoulli Distribution! To estimate p, we count up numbers of ones and zeros Given observed ones and zeroes, how do we calculate the distibution of possible values of p?

Beta-Bernoulli Case Beta(a,b) Given a 0 s and b 1 s, what is the distribution over means? Prior pseudocounts Likelihood Observed counts Posterior psuedocounts + observed counts

How does this help us? Thompson Sampling: 1. Specify prior (in Beta case often Beta(1,1)) 2. Sample from each posterior distribution to get estimated mean for each arm. 3. Pull arm with highest mean. 4. Repeat step 2 & 3 forever

Thompson Empirical Results And shown to have optimal regret bounds just like (and in some cases a little better than) UCB!

Problems with standard MAB formulation What happens if adversarial? (studied elsewhere) What if arms are factored and share values? What happens if there is delay? What if we know a heuristic that we want to incorporate without harming guarantees? What if we have a prior dataset we want to use? What if we want to evaluate algorithms/parameters on previously collected data?

Problems with standard MAB formulation What happens if adversarial? (studied elsewhere) What if arms are factored and share values? What happens if there is delay? What if we know a heuristic that we want to incorporate without harming guarantees? What if we have a prior dataset we want to use? What if we want to evaluate algorithms/parameters on previously collected data?

A look toward more complexity: Reinforcement Learning Sequence of actions instead of take one and reset Rich space of observations which should influence our choice of action Much more powerful, but a much harder problem!

Summary Online Learning Adversarial vs Statistical Feedback Regret Experts Problem EG algorithm Multi-Armed bandits e-greedy, e-first UCB Thompson Sampling

Questions?

Online Learning in Machine Learning

Download Presentation

Presentation Transcript

Related

More Related Content