Practical Implementation of Machine Learning by Geoff Hulten

Implementing with

Machine Learning

Geoff Hulten

Example of an Implementation: Laugh Finder

Whatever.com/whatever.html

Lorem ipsum dolor sit amet, consectetur

adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud

exercitation ullamco laboris nisi ut aliquip ex

ea commodo consequat. Duis aute irure

dolor in reprehenderit in voluptate velit

esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non

proident, sunt in culpa qui officia deserunt

mollit anim id est laborum.

Blah blah blah…

Sapien eget mi proin

sed libero enim. Purus

sit amet volutpat

consequat.

Indicates if the web page is funny

Browser Plugin

funnyWords = [ ‘enim’, …, ‘fugiat’, … ]

every time a page loads:

page = GetPageContext()

isFunny = false

for word in page.words:

if( word in funnyWords ):

isFunny = true

UpdateUserExperience( isFunny )

Basic Machine Learning

Whatever.com/whatever.html

Lorem ipsum dolor sit amet, consectetur

adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud

exercitation ullamco laboris nisi ut aliquip ex

ea commodo consequat. Duis aute irure

dolor in reprehenderit in voluptate velit

esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non

proident, sunt in culpa qui officia deserunt

mollit anim id est laborum.

Blah blah blah…

Sapien eget mi proin

sed libero enim. Purus

sit amet volutpat

consequat.

Construct feature vector in correct

order by testing for presence of

selected words…

Call into inference engine to apply

the model to the feature vector…

Apply tuned threshold to achieve

desired (stable) operating point…

‘Run Time’ vs ‘Creation Time’

Intelligence Creation Environment

Model in Run Time

Must be in sync!

Must be in sync!

Param sweep

More param

sweeps

Validation

Set Stuff

Intelligence

Management

Intelligence Management

Intelligence Creation Environment

•

Verify new intelligence

•

Decide when to deploy

•

Manage Deployment

Intelligence Management

New

Model

Intelligence Runtime

•

Periodically build

new model

Compile and push app

Host Model in update

service

every time app starts:

UpdateModel( <server>, <dataFile> )

model.load( <dataFile> )

Check for new model

Download and save

Hide latency (?)

Monitoring Models

Whatever.com/whatever.html

Lorem ipsum dolor sit amet, consectetur

adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud

exercitation ullamco laboris nisi ut aliquip ex

ea commodo consequat. Duis aute irure

dolor in reprehenderit in voluptate velit

esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non

proident, sunt in culpa qui officia deserunt

mollit anim id est laborum.

Blah blah blah…

Sapien eget mi proin

sed libero enim. Purus

sit amet volutpat

consequat.

Is it working?

Verifying Model is Running as Expected

every time a page loads:

page = GetAppContext()

x = Featurize(model.words, page.words)

y = model.predict( [ x ] )

if random.random() < veryLowSamplingRate:

LogToServer(page, x, y, model.version)

Verifying Model Quality is Good

Hand Label

User Reports

every time user clicks a report:

page = GetAppContext()

x = Featurize(model.words, page.words)

y = model.predict( [ x ] )

LogToServer(page, x, y, model.version, userReport)

Verify in

Creation

Environment

Add to

Training Set

(?)

Privacy!!

Design Patterns for ML

•

Where in the system do you use Machine Learning?

•

All the choices you have to make along the way…

•

We’ll get to more as the course goes on, but for

now…

•

How do you get the data to do the learning?

Data Collection

Data Labeling

Intelligence Creation Environment

Training Corpus

Data Collection and Labeling Process

Intelligence Management

Intelligence Runtime

Can be quite expensive…

Intelligence Creation Environment

Training Corpus

Intelligence Management

Intelligence Runtime

Explicit and Implicit User Feedback

Corpus Centric

Closed Loop

Can be tricky to get right…

•

Explicit

•

Reporting UX

•

Survey

•

Implicit

•

Achieve the desired result

•

Forward to a friend

•

Laugh out loud (audio

detection ?)

More discussion of this later…

We’ll get to these as course progresses

Orchestration

Whatever.com/whatever.html

Lorem ipsum dolor sit amet, consectetur

adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud

exercitation ullamco laboris nisi ut aliquip ex

ea commodo consequat. Duis aute irure

dolor in reprehenderit in voluptate velit

esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non

proident, sunt in culpa qui officia deserunt

mollit anim id est laborum.

Blah blah blah…

Sapien eget mi proin

sed libero enim. Purus

sit amet volutpat

consequat.

Bad False Positive

every time a page loads:

page = GetAppContext()

if any of page.words in blockedWords:

isFunny = false

# no matter what model says

Bad False Negative

every time a page loads:

page = GetAppContext()

if page.domain in whitelist:

isFunny = true

# no matter what model says

Super non-funny

text (bad FP)

Comedy Site that model

always calls not funny (bad FN)

Lots of fake reports (abuse)

More Reasons for

Orchestration:

•

Driving Quality

•

Concept Changes

•

Costs Change

•

Users Change

•

Bugs and Issues

•

Etc.

Could try hard to tune models for specific bad mistakes, but sometimes

heuristics are cheaper and more robust.

( and sometimes heuristics turn your system into brittle-spaghetti gibberish… )

Summary: Components of an Implementation

Intelligence Creation Environment

Intelligence Runtime

•

Program State -> Context

•

Execute Feature Code

•

Execute Model

•

Interpret Model Output

•

Control User Experience

•

Update:

•

Models

•

Feature Code

Telemetry

Intelligence Orchestration

•

Verify new intelligence

•

Control rollouts:

•

Keep in sync

•

Clients/services

•

Support online evaluation

•

Telemetry -> Context

•

Feature code in sync

•

Computation & Data

•

All the training stuff…

•

Verifying outcomes

•

Training data

•

Selecting what to observe

•

Sampling

•

Summarizing

•

Monitoring success

•

Inspect Interactions

•

Adapt as things change

•

Deal with mistakes

•

Updating thresholds

•

Drive the racecar

Intelligence Management

Slide Note

Embed Share

Download

In this comprehensive guide, Geoff Hulten introduces implementing machine learning with practical examples like the Laugh Finder application that detects humorous web pages. The text also delves into a basic machine learning example file format and discusses the crucial aspects of run time versus creation time in creating intelligent models. Furthermore, insights are shared on intelligence management methods, emphasizing the need for effective deployment strategies and model updates for optimal performance.

nilt889 Follow

Uploaded on Sep 24, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Implementing with Machine Learning Geoff Hulten

Example of an Implementation: Laugh Finder Indicates if the web page is funny x Whatever.com/whatever.html Browser Plugin funnyWords = [ enim , , fugiat , ] Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. every time a page loads: page = GetPageContext() Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. isFunny = false for word in page.words: if( word in funnyWords ): isFunny = true UpdateUserExperience( isFunny )

Basic Machine Learning Example File Format: <?>, <?>, <????0>, <?0>, <????1>, < ?1>, , < ?????> < ??> x Whatever.com/whatever.html model.load(<dataFile>) Blah blah blah every time a page loads: page = GetAppContext() Construct feature vector in correct order by testing for presence of selected words Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. x = Featurize(model.words, page.words) Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Call into inference engine to apply the model to the feature vector y = model.predict( [ x ] ) isFunny = y[0] > model.? Apply tuned threshold to achieve desired (stable) operating point UpdateUserExperience( isFunny )

Run Time vs Creation Time Param sweep Intelligence Creation Environment Model in Run Time model.load(<dataFile>) for every pageID in trainingSet: ( page, label ) = LoadPageFromLog( pageID ) pages.append( page ); Y.append( label ) Must be in sync! every time a page loads: page = GetAppContext() (xTrainRaw, yTrain, xValidateRaw ) = SplitData(pages,Y) x = Featurize(model.words, page.words) selectedFeatures = FeatureSelect( xTrainRaw, yTrain <params> ) y = model.predict( [ x ] ) Must be in sync! for xRaw in xTrainRaw: x = Featurize( selectedFeatures, xRaw.words ) xTrain.append( x ) isFunny = y[0] > ? UpdateUserExperience( isFunny ) model = train( <more params>, xTrain , yTrain ) Validation Set Stuff ? = FindThreshold( thresholdSet ) More param sweeps model.save(selectedFeatures, ?, model) # Deploy it Intelligence Management

Intelligence Management Intelligence Creation Environment Intelligence Management Intelligence Runtime Compile and push app Verify new intelligence Decide when to deploy Manage Deployment Periodically build new model New Model Host Model in update service Check for new model every time app starts: UpdateModel( <server>, <dataFile> ) model.load( <dataFile> ) Download and save Hide latency (?)

Monitoring Models Privacy!! Verifying Model is Running as Expected every time a page loads: page = GetAppContext() x = Featurize(model.words, page.words) y = model.predict( [ x ] ) Is it working? Verify in Creation Environment x Whatever.com/whatever.html Report as: Not funny Funny if random.random() < veryLowSamplingRate: LogToServer(page, x, y, model.version) Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. User Reports Hand Label Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Verifying Model Quality is Good every time user clicks a report: page = GetAppContext() x = Featurize(model.words, page.words) y = model.predict( [ x ] ) Add to Training Set (?) LogToServer(page, x, y, model.version, userReport)

Where in the system do you use Machine Learning? All the choices you have to make along the way We ll get to more as the course goes on, but for now How do you get the data to do the learning? Design Patterns for ML Can be quite expensive Data Collection and Labeling Process Intelligence Creation Environment Intelligence Management Intelligence Runtime Corpus Centric Data Collection Training Corpus Data Labeling Intelligence Creation Environment Intelligence Management Intelligence Runtime Explicit Reporting UX Survey Closed Loop Implicit Achieve the desired result Forward to a friend Laugh out loud (audio detection ?) Training Corpus Explicit and Implicit User Feedback More discussion of this later Can be tricky to get right

ML Design Patterns Examples Best When Key Challenges Hard But Stable Problem Can t use data from customer interaction Bootstrapping a new system Collecting & Labeling Data Sophisticated Modeling Corpus Centric Computer Vision, Xbox Kinect, Speech Open-ended/Time-changing problems Users and Intelligence interact at Scale Self Driving Car, Recommender Systems Shaping User/ML Interactions Orchestrating Evolving System Closed Loop Ad Targeting, Search, Voice Assistants, Designer Diverse at scale content creation Users and Intelligence Interact at Scale Find right Situation for Each Content Exploring and Avoiding Feedback Loops Scoring/Ranking Spam Filtering, Malware detection, Account compromise, Anti-phishing Buffer ML From Adversary Change Economics of Broader System Adversarial Hyper Time-changing (adversarial) We ll get to these as course progresses Difficult to Converge Expensive (AlphaGo ~40 days1) Games (e.g. Alpha Go), Robotics Correct Labeling Hard for Human High Scale Simulator to Learn From Reinforcement

Orchestration Bad False Positive every time a page loads: page = GetAppContext() Comedy Site that model always calls not funny (bad FN) Lots of fake reports (abuse) x Whatever.com/whatever.html if any of page.words in blockedWords: isFunny = false # no matter what model says Report as: Not funny Funny Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. Could try hard to tune models for specific bad mistakes, but sometimes heuristics are cheaper and more robust. ( and sometimes heuristics turn your system into brittle-spaghetti gibberish ) Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Bad False Negative every time a page loads: page = GetAppContext() More Reasons for Orchestration: Driving Quality Concept Changes Costs Change Users Change Bugs and Issues Etc. if page.domain in whitelist: isFunny = true # no matter what model says Super non-funny text (bad FP)

Summary: Components of an Implementation Intelligence Management Verify new intelligence Control rollouts: Keep in sync Clients/services Support online evaluation Intelligence Runtime Intelligence Creation Environment Intelligence Orchestration Monitoring success Inspect Interactions Adapt as things change Deal with mistakes Updating thresholds Drive the racecar Program State -> Context Execute Feature Code Execute Model Interpret Model Output Control User Experience Update: Models Feature Code Telemetry -> Context Feature code in sync Computation & Data All the training stuff Telemetry Verifying outcomes Training data Selecting what to observe Sampling Summarizing

Practical Implementation of Machine Learning by Geoff Hulten

Download Presentation

Presentation Transcript

Related

More Related Content