Practical Implementation of Machine Learning by Geoff Hulten

 
Implementing with
Machine Learning
 
Geoff Hulten
Example of an Implementation: Laugh Finder
Whatever.com/whatever.html
x
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure
dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.
Blah blah blah…
Sapien eget mi proin
sed libero enim. Purus
sit amet volutpat
consequat.
 
Indicates if the web page is funny
 
Browser Plugin
funnyWords = [ ‘enim’, …, ‘fugiat’, … ]
every time a page loads:
page = GetPageContext()
isFunny = false
for word in page.words:
if( word in funnyWords ):
isFunny = true
UpdateUserExperience( isFunny )
Basic Machine Learning
Whatever.com/whatever.html
x
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure
dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.
Blah blah blah…
Sapien eget mi proin
sed libero enim. Purus
sit amet volutpat
consequat.
 
Construct feature vector in correct
order by testing for presence of
selected words…
 
Call into inference engine to apply
the model to the feature vector…
 
Apply tuned threshold to achieve
desired (stable) operating point…
‘Run Time’ vs ‘Creation Time’
Intelligence Creation Environment
Model in Run Time
 
Must be in sync!
 
Must be in sync!
 
Param sweep
 
More param
sweeps
 
Validation
Set Stuff
 
Intelligence
Management
Intelligence Management
Intelligence Creation Environment
 
Verify new intelligence
Decide when to deploy
Manage Deployment
 
Intelligence Management
New
Model
 
Intelligence Runtime
Periodically build
new model
Compile and push app
Host Model in update
service
every time app starts:
UpdateModel( <server>, <dataFile> )
model.load( <dataFile> )
 
Check for new model
 
Download and save
 
Hide latency (?)
Monitoring Models
Whatever.com/whatever.html
x
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure
dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.
Blah blah blah…
Sapien eget mi proin
sed libero enim. Purus
sit amet volutpat
consequat.
 
Is it working?
 
Verifying Model is Running as Expected
every time a page loads:
page = GetAppContext()
x = Featurize(model.words, page.words)
y = model.predict( [ x ] )
if random.random() < veryLowSamplingRate:
LogToServer(page, x, y, model.version)
 
Verifying Model Quality is Good
 
Hand Label
 
User Reports
every time user clicks a report:
page = GetAppContext()
x = Featurize(model.words, page.words)
y = model.predict( [ x ] )
LogToServer(page, x, y, model.version, userReport)
 
Verify in
Creation
Environment
 
Add to
Training Set
(?)
 
Privacy!!
Design Patterns for ML
 
Where in the system do you use Machine Learning?
All the choices you have to make along the way…
We’ll get to more as the course goes on, but for
now…
How do you get the data to do the learning?
 
Data Collection
 
Data Labeling
 
Intelligence Creation Environment
 
Training Corpus
 
Data Collection and Labeling Process
 
Intelligence Management
 
Intelligence Runtime
 
Can be quite expensive…
 
Intelligence Creation Environment
 
Training Corpus
 
Intelligence Management
 
Intelligence Runtime
 
Explicit and Implicit User Feedback
 
Corpus Centric
 
Closed Loop
 
Can be tricky to get right…
 
Explicit
Reporting UX
Survey
Implicit
Achieve the desired result
Forward to a friend
Laugh out loud (audio
detection ?)
More discussion of this later…
 
We’ll get to these as course progresses
Orchestration
Whatever.com/whatever.html
x
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure
dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.
Blah blah blah…
Sapien eget mi proin
sed libero enim. Purus
sit amet volutpat
consequat.
 
Bad False Positive
every time a page loads:
page = GetAppContext()
if any of page.words in blockedWords:
isFunny = false 
# no matter what model says
 
Bad False Negative
every time a page loads:
page = GetAppContext()
if page.domain in whitelist:
isFunny = true 
# no matter what model says
 
Super non-funny
text (bad FP)
 
Comedy Site that model
always calls not funny (bad FN)
 
Lots of fake reports (abuse)
 
More Reasons for
Orchestration:
Driving Quality
Concept Changes
Costs Change
Users Change
Bugs and Issues
Etc.
 
Could try hard to tune models for specific bad mistakes, but sometimes
heuristics are cheaper and more robust.
 
( and sometimes heuristics turn your system into brittle-spaghetti gibberish… )
Summary: Components of an Implementation
Intelligence Creation Environment
 
Intelligence Runtime
 
Program State -> Context
Execute Feature Code
Execute Model
Interpret Model Output
Control User Experience
Update:
Models
Feature Code
 
Telemetry
 
Intelligence Orchestration
 
Verify new intelligence
Control rollouts:
Keep in sync
Clients/services
Support online evaluation
Telemetry -> Context
Feature code in sync
Computation & Data
All the training stuff…
 
Verifying outcomes
Training data
Selecting what to observe
Sampling
Summarizing
 
Monitoring success
Inspect Interactions
Adapt as things change
Deal with mistakes
Updating thresholds
Drive the racecar
 
Intelligence Management
Slide Note
Embed
Share

In this comprehensive guide, Geoff Hulten introduces implementing machine learning with practical examples like the Laugh Finder application that detects humorous web pages. The text also delves into a basic machine learning example file format and discusses the crucial aspects of run time versus creation time in creating intelligent models. Furthermore, insights are shared on intelligence management methods, emphasizing the need for effective deployment strategies and model updates for optimal performance.

  • Machine Learning
  • Practical Implementation
  • Geoff Hulten
  • Laugh Finder
  • Model Deployment

Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Implementing with Machine Learning Geoff Hulten

  2. Example of an Implementation: Laugh Finder Indicates if the web page is funny x Whatever.com/whatever.html Browser Plugin funnyWords = [ enim , , fugiat , ] Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. every time a page loads: page = GetPageContext() Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. isFunny = false for word in page.words: if( word in funnyWords ): isFunny = true UpdateUserExperience( isFunny )

  3. Basic Machine Learning Example File Format: <?>, <?>, <????0>, <?0>, <????1>, < ?1>, , < ?????> < ??> x Whatever.com/whatever.html model.load(<dataFile>) Blah blah blah every time a page loads: page = GetAppContext() Construct feature vector in correct order by testing for presence of selected words Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. x = Featurize(model.words, page.words) Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Call into inference engine to apply the model to the feature vector y = model.predict( [ x ] ) isFunny = y[0] > model.? Apply tuned threshold to achieve desired (stable) operating point UpdateUserExperience( isFunny )

  4. Run Time vs Creation Time Param sweep Intelligence Creation Environment Model in Run Time model.load(<dataFile>) for every pageID in trainingSet: ( page, label ) = LoadPageFromLog( pageID ) pages.append( page ); Y.append( label ) Must be in sync! every time a page loads: page = GetAppContext() (xTrainRaw, yTrain, xValidateRaw ) = SplitData(pages,Y) x = Featurize(model.words, page.words) selectedFeatures = FeatureSelect( xTrainRaw, yTrain <params> ) y = model.predict( [ x ] ) Must be in sync! for xRaw in xTrainRaw: x = Featurize( selectedFeatures, xRaw.words ) xTrain.append( x ) isFunny = y[0] > ? UpdateUserExperience( isFunny ) model = train( <more params>, xTrain , yTrain ) Validation Set Stuff ? = FindThreshold( thresholdSet ) More param sweeps model.save(selectedFeatures, ?, model) # Deploy it Intelligence Management

  5. Intelligence Management Intelligence Creation Environment Intelligence Management Intelligence Runtime Compile and push app Verify new intelligence Decide when to deploy Manage Deployment Periodically build new model New Model Host Model in update service Check for new model every time app starts: UpdateModel( <server>, <dataFile> ) model.load( <dataFile> ) Download and save Hide latency (?)

  6. Monitoring Models Privacy!! Verifying Model is Running as Expected every time a page loads: page = GetAppContext() x = Featurize(model.words, page.words) y = model.predict( [ x ] ) Is it working? Verify in Creation Environment x Whatever.com/whatever.html Report as: Not funny Funny if random.random() < veryLowSamplingRate: LogToServer(page, x, y, model.version) Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. User Reports Hand Label Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Verifying Model Quality is Good every time user clicks a report: page = GetAppContext() x = Featurize(model.words, page.words) y = model.predict( [ x ] ) Add to Training Set (?) LogToServer(page, x, y, model.version, userReport)

  7. Where in the system do you use Machine Learning? All the choices you have to make along the way We ll get to more as the course goes on, but for now How do you get the data to do the learning? Design Patterns for ML Can be quite expensive Data Collection and Labeling Process Intelligence Creation Environment Intelligence Management Intelligence Runtime Corpus Centric Data Collection Training Corpus Data Labeling Intelligence Creation Environment Intelligence Management Intelligence Runtime Explicit Reporting UX Survey Closed Loop Implicit Achieve the desired result Forward to a friend Laugh out loud (audio detection ?) Training Corpus Explicit and Implicit User Feedback More discussion of this later Can be tricky to get right

  8. ML Design Patterns Examples Best When Key Challenges Hard But Stable Problem Can t use data from customer interaction Bootstrapping a new system Collecting & Labeling Data Sophisticated Modeling Corpus Centric Computer Vision, Xbox Kinect, Speech Open-ended/Time-changing problems Users and Intelligence interact at Scale Self Driving Car, Recommender Systems Shaping User/ML Interactions Orchestrating Evolving System Closed Loop Ad Targeting, Search, Voice Assistants, Designer Diverse at scale content creation Users and Intelligence Interact at Scale Find right Situation for Each Content Exploring and Avoiding Feedback Loops Scoring/Ranking Spam Filtering, Malware detection, Account compromise, Anti-phishing Buffer ML From Adversary Change Economics of Broader System Adversarial Hyper Time-changing (adversarial) We ll get to these as course progresses Difficult to Converge Expensive (AlphaGo ~40 days1) Games (e.g. Alpha Go), Robotics Correct Labeling Hard for Human High Scale Simulator to Learn From Reinforcement

  9. Orchestration Bad False Positive every time a page loads: page = GetAppContext() Comedy Site that model always calls not funny (bad FN) Lots of fake reports (abuse) x Whatever.com/whatever.html if any of page.words in blockedWords: isFunny = false # no matter what model says Report as: Not funny Funny Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. Could try hard to tune models for specific bad mistakes, but sometimes heuristics are cheaper and more robust. ( and sometimes heuristics turn your system into brittle-spaghetti gibberish ) Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Bad False Negative every time a page loads: page = GetAppContext() More Reasons for Orchestration: Driving Quality Concept Changes Costs Change Users Change Bugs and Issues Etc. if page.domain in whitelist: isFunny = true # no matter what model says Super non-funny text (bad FP)

  10. Summary: Components of an Implementation Intelligence Management Verify new intelligence Control rollouts: Keep in sync Clients/services Support online evaluation Intelligence Runtime Intelligence Creation Environment Intelligence Orchestration Monitoring success Inspect Interactions Adapt as things change Deal with mistakes Updating thresholds Drive the racecar Program State -> Context Execute Feature Code Execute Model Interpret Model Output Control User Experience Update: Models Feature Code Telemetry -> Context Feature code in sync Computation & Data All the training stuff Telemetry Verifying outcomes Training data Selecting what to observe Sampling Summarizing

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#