Understanding Success in ML Systems through Goals and Metrics

Slide Note
Embed
Share

Explore the importance of setting good goals in ML systems, the connection between model properties and user outcomes, and the impact of achieving desired outcomes in SMS spam detection. Learn about defining success, types of goals, and ensuring measurable and achievable objectives for ML implementation.


Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Defining Success with ML Systems Geoff Hutlen

  2. Goal for SMS spam Train a model like this and you ve got victory! Operating Point: 1% FPR Block > 95% of the spam Wrong operating points? - 0.5% FPR better? - 1.0% FNR better? False Positive Rate What Could Go Wrong? 0% 1% 2% 3% 4% 5% 0% 5% Maybe these aren t the right targets Users hate it Spammers ignore it Wrong metrics? - Same FNR, more spam - Same FPR, less value - PR better? 10% 15% False Negative Rate Delta: 20% Maybe the problem changes Spammers send 20x more spam Spammers send 20x less spam 25% 30% 35% 40% Maybe the FPs are distributed poorly Totally shut down all bank statements Totally shut down all small senders Wrong test data? - Missing critical sub-populations? - Some errors more costly than others? 45% 50% Model 1 Model 2 Maybe no one knows what you're talking about Management has no idea what a FN is Move FN metric, but don t move business metric Wrong vocabulary / concepts? - Technical terms don t map across participants - ML goals not tied to actual goals

  3. Properties of a Good Goal Communicate desired outcome What is success and why is it important In the language of all participants New mysterious ML vocabulary added to discussions Achievable Participants can see a path toward success Even if it s difficult ML requires new intuitions about what is easy and what is hard ML just another tool, not a magic problem solving box Measurable Quantify benefit of progress and cost of failure Evaluation is creation Setting the right goals for ML is just about the hardest, most frustrating and most important part of using ML in practice

  4. More Detail on SMS Spam 100 Non-Spam Per Week 1000 Spam Per Week 1% FPR 5% FNR 1 FP per week 50 inbox spam per week Friends who message you Spammers who know about you What the Model Does What the User Sees What the World Does Inbox ~33% Spam

  5. Types of goals Model Properties User Outcomes User Churn Profit Models + UX Create Leading Indicators Organizational Objectives Cost of managing spam < 1% of service cost Avg User reports < 1 spam a month Happy, Engaged Users Lead to

  6. Connecting Model Properties to User Outcomes Will change over time 1000 Spam vs 100 Good 50% 45% 40% Current FN Rate ~4-5 points of % Inbox is Spam per point of FNR 35% % Inbox is Spam 30% 25% 20% User Outcomes 15% 10% Complex Relationship over Time Short Term Gradient 5% 0% Model Properties 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% FN Rate

  7. Connecting Outcomes to Leading Indicators Diminishing Effect of Bad User Outcomes on sentiment Bad User Outcomes with large effect on sentiment 5 5 4.5 4.5 4 4 Star Rating Star Rating Leading Indicators 3.5 3.5 User Outcomes 3 3 2.5 2.5 0% 0% 5% 5% 10% 10% 15% 15% 20% 20% 25% 25% 30% 30% 35% 35% 40% 40% 45% 45% 50% 50% Model Properties % Inbox is Spam % Inbox is Spam Affected by many things and changes slowly Don t expect model launch to change it overnight

  8. Connecting Outcomes to Leading Indicators 1000 Spam vs 100 Good Bad User Outcomes with large effect on sentiment 50% 40% % Inbox is Spam 30% 5 20% 10% 4.5 Leading Indicators 0% 0% 2% 4% 6% 8% 10% 4 Star Rating FN Rate 3.5 User Outcomes 3 2.5 Diminishing Effect of Bad User Outcomes on sentiment Work on other problems: - FPR - Latency - Cost to run 0% 10% 20% 30% 40% 50% Model Properties % Inbox is Spam

  9. Putting it Together Setting a goal for your ML 2% of User Churn 10k Monthly Saving Organizational Objectives .5 of a star rating Leading Indicators 4-5 points of % of Inbox is Spam User Outcomes Model Properties 1% of FNR

  10. Why not Optimize Actual Goal Directly? Indirect Relationship Effect of single actions small Long delay for feedback Affected by many things User Experience Competitors Marketing Interactions with other parts of the system Changing concepts Relatively few observations ML on Concrete Sub-problem Large (local) effect of actions Quick feedback Context contains most of the relevant info Can get quick implicit feedback Can inspect context and hand-label Many, many observations

  11. Summary Setting goals for ML systems is frustrating, difficult, and critical Hierarchy of Goals Model properties user outcomes leading indicators organizational success Good goals: Communicate success Inspire participants Are Measurable Global relationships complex, local approximations help Need to invest to keep goals healthy over time

Related