Practical Implementation of Machine Learning by Geoff Hulten
In this comprehensive guide, Geoff Hulten introduces implementing machine learning with practical examples like the Laugh Finder application that detects humorous web pages. The text also delves into a basic machine learning example file format and discusses the crucial aspects of run time versus creation time in creating intelligent models. Furthermore, insights are shared on intelligence management methods, emphasizing the need for effective deployment strategies and model updates for optimal performance.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Implementing with Machine Learning Geoff Hulten
Example of an Implementation: Laugh Finder Indicates if the web page is funny x Whatever.com/whatever.html Browser Plugin funnyWords = [ enim , , fugiat , ] Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. every time a page loads: page = GetPageContext() Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. isFunny = false for word in page.words: if( word in funnyWords ): isFunny = true UpdateUserExperience( isFunny )
Basic Machine Learning Example File Format: <?>, <?>, <????0>, <?0>, <????1>, < ?1>, , < ?????> < ??> x Whatever.com/whatever.html model.load(<dataFile>) Blah blah blah every time a page loads: page = GetAppContext() Construct feature vector in correct order by testing for presence of selected words Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. x = Featurize(model.words, page.words) Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Call into inference engine to apply the model to the feature vector y = model.predict( [ x ] ) isFunny = y[0] > model.? Apply tuned threshold to achieve desired (stable) operating point UpdateUserExperience( isFunny )
Run Time vs Creation Time Param sweep Intelligence Creation Environment Model in Run Time model.load(<dataFile>) for every pageID in trainingSet: ( page, label ) = LoadPageFromLog( pageID ) pages.append( page ); Y.append( label ) Must be in sync! every time a page loads: page = GetAppContext() (xTrainRaw, yTrain, xValidateRaw ) = SplitData(pages,Y) x = Featurize(model.words, page.words) selectedFeatures = FeatureSelect( xTrainRaw, yTrain <params> ) y = model.predict( [ x ] ) Must be in sync! for xRaw in xTrainRaw: x = Featurize( selectedFeatures, xRaw.words ) xTrain.append( x ) isFunny = y[0] > ? UpdateUserExperience( isFunny ) model = train( <more params>, xTrain , yTrain ) Validation Set Stuff ? = FindThreshold( thresholdSet ) More param sweeps model.save(selectedFeatures, ?, model) # Deploy it Intelligence Management
Intelligence Management Intelligence Creation Environment Intelligence Management Intelligence Runtime Compile and push app Verify new intelligence Decide when to deploy Manage Deployment Periodically build new model New Model Host Model in update service Check for new model every time app starts: UpdateModel( <server>, <dataFile> ) model.load( <dataFile> ) Download and save Hide latency (?)
Monitoring Models Privacy!! Verifying Model is Running as Expected every time a page loads: page = GetAppContext() x = Featurize(model.words, page.words) y = model.predict( [ x ] ) Is it working? Verify in Creation Environment x Whatever.com/whatever.html Report as: Not funny Funny if random.random() < veryLowSamplingRate: LogToServer(page, x, y, model.version) Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. User Reports Hand Label Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Verifying Model Quality is Good every time user clicks a report: page = GetAppContext() x = Featurize(model.words, page.words) y = model.predict( [ x ] ) Add to Training Set (?) LogToServer(page, x, y, model.version, userReport)
Where in the system do you use Machine Learning? All the choices you have to make along the way We ll get to more as the course goes on, but for now How do you get the data to do the learning? Design Patterns for ML Can be quite expensive Data Collection and Labeling Process Intelligence Creation Environment Intelligence Management Intelligence Runtime Corpus Centric Data Collection Training Corpus Data Labeling Intelligence Creation Environment Intelligence Management Intelligence Runtime Explicit Reporting UX Survey Closed Loop Implicit Achieve the desired result Forward to a friend Laugh out loud (audio detection ?) Training Corpus Explicit and Implicit User Feedback More discussion of this later Can be tricky to get right
ML Design Patterns Examples Best When Key Challenges Hard But Stable Problem Can t use data from customer interaction Bootstrapping a new system Collecting & Labeling Data Sophisticated Modeling Corpus Centric Computer Vision, Xbox Kinect, Speech Open-ended/Time-changing problems Users and Intelligence interact at Scale Self Driving Car, Recommender Systems Shaping User/ML Interactions Orchestrating Evolving System Closed Loop Ad Targeting, Search, Voice Assistants, Designer Diverse at scale content creation Users and Intelligence Interact at Scale Find right Situation for Each Content Exploring and Avoiding Feedback Loops Scoring/Ranking Spam Filtering, Malware detection, Account compromise, Anti-phishing Buffer ML From Adversary Change Economics of Broader System Adversarial Hyper Time-changing (adversarial) We ll get to these as course progresses Difficult to Converge Expensive (AlphaGo ~40 days1) Games (e.g. Alpha Go), Robotics Correct Labeling Hard for Human High Scale Simulator to Learn From Reinforcement
Orchestration Bad False Positive every time a page loads: page = GetAppContext() Comedy Site that model always calls not funny (bad FN) Lots of fake reports (abuse) x Whatever.com/whatever.html if any of page.words in blockedWords: isFunny = false # no matter what model says Report as: Not funny Funny Blah blah blah Sapien eget mi proin sed libero enim. Purus sit amet volutpat consequat. Could try hard to tune models for specific bad mistakes, but sometimes heuristics are cheaper and more robust. ( and sometimes heuristics turn your system into brittle-spaghetti gibberish ) Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Bad False Negative every time a page loads: page = GetAppContext() More Reasons for Orchestration: Driving Quality Concept Changes Costs Change Users Change Bugs and Issues Etc. if page.domain in whitelist: isFunny = true # no matter what model says Super non-funny text (bad FP)
Summary: Components of an Implementation Intelligence Management Verify new intelligence Control rollouts: Keep in sync Clients/services Support online evaluation Intelligence Runtime Intelligence Creation Environment Intelligence Orchestration Monitoring success Inspect Interactions Adapt as things change Deal with mistakes Updating thresholds Drive the racecar Program State -> Context Execute Feature Code Execute Model Interpret Model Output Control User Experience Update: Models Feature Code Telemetry -> Context Feature code in sync Computation & Data All the training stuff Telemetry Verifying outcomes Training data Selecting what to observe Sampling Summarizing