Strategies for Extreme Classification: Improving Quality Without Sacrifices

Slide Note
Embed
Share

Can Facebook leverage data to tackle extreme classification challenges efficiently? By identifying plausible labels and invoking classifiers selectively, quality can be improved without compromise. Explore how strategies involving small sets of labels can optimize the classification process.


Uploaded on Sep 22, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. One Simple Thing To Immediately Make Extreme Classification Easy Find Out What Rachel McAdams and Harrison Ford have to say about it

  2. One Simple Thing To Immediately Make Extreme Classification Easy Find Out What Rachel McAdams and Harrison Ford have to say about it Joint work with Nikos Karampatziakis

  3. Facebook arguably has the data to solve this, but how to do it?

  4. Facebook arguably has the data to solve this, but how to do it? There are billions of possible labels.

  5. Facebook arguably has the data to solve this, but how to do it? There are billions of possible labels.

  6. Can we quickly identify plausible labels?

  7. Can we quickly identify plausible labels? without sacrificing quality?

  8. Can we quickly identify plausible labels? without sacrificing quality? or even, improving quality?

  9. Strategy: Given an example: 1. Compute small set of plausible labels 2. Invoke expensive classifier over plausible labels only.

  10. 109labels

  11. 104labels

  12. Strategy: Given an example: 1. Compute small set of plausible labels 2. Invoke expensive classifier over plausible labels only.

  13. Strategy: Given an example: 1. Compute small set of plausible labels 2. Invoke expensive classifier over plausible labels only.

  14. Strategy: Given an example: 1. Compute small set of plausible labels 2. Invoke expensive classifier over plausible labels only.

  15. Strategy: Given an example: 1. Compute small set of plausible labels 2. Invoke expensive classifier over plausible labels only.

  16. Pretend were doing multiclass for a minute Build a tree At each node try to send each class s examples exclusively left or right While sending roughly the same number of examples left or right in aggregate

  17. ?1,?1= 1 ?2,?2= 1 ?3,?3= 2 (?4,?4= 2) ?1,?1= 1 ?2,?2= 1 ?3,?3= 2 (?4,?4= 2)

  18. ?1,?1= 1 ?2,?2= 1 ?3,?3= 2 (?4,?4= 2) ?(?;?) ?1,?1= 1 ?2,?2= 1 ?3,?3= 2 (?4,?4= 2) < 0 0

  19. ?1,?1= 1 ?2,?2= 1 ?3,?3= 2 (?4,?4= 2) ? ? ?1,?1= 1 ?2,?2= 1 ?3,?3= 2 (?4,?4= 2) < 0 0

  20. Achieve this via an eigenvalue problem

  21. Achieve this via an eigenvalue problem ``Push all class-conditional means away from zero

  22. Achieve this via an eigenvalue problem ``Push all class-conditional means away from zero ``while having average value of zero

  23. Achieve this via an eigenvalue problem ``Push all class-conditional means away from zero ``while having average value of zero ?(1) ?(3) ? ? ?(2) ?(4)

  24. maximize ?( ? ?) ? s.t. ? ? 1 1 ? ? = 0

  25. maximize ?(?? ??1??) ? s.t. ? ? 1 1 ? ? = 0

  26. maximize ?(?? ??1??) ? s.t. ? ? 1 1 ? ? = 0 Works for multilabel!

  27. Problem In high dimensions, most vectors are orthogonal routing margins tend to be small So we use randomized routing during training

  28. ?1,?1= 1 ?2,?2= 1 ?3,?3= 2 (?4,?4= 2) ? ? ?1,?1= 1 ?2,?2= 1 ?3,?3= 2 (?4,?4= 2) < 0 0

  29. Training the ``plausible label filter: Build a tree At each internal node, solve eigenvalue problem Route examples and recurse to desired depth At leaf nodes, most frequent classes are ``plausible

  30. Training the ``plausible label classifier: Build a tree At each internal node, solve eigenvalue problem Route examples and recurse to desired depth At leaf nodes, most frequent classes are ``plausible

  31. Reminder Once we have the plausible label filter, We train an underlying classifier. (Logistic regression)

  32. Twitter Predict hashtags from tweets Labels = hashtags Features = words (unigrams + bigrams) Build tree only

  33. #jobs #it #nowplaying #manager #dev #engineering #ff #java #marketing #php #job #net #project #developer #hiring #programmer #engineer #consultant #customer #flash

  34. #ascendant #mediumcoeli #nowplaying #leo #cancer #sagittarius #scorpio #virgo #libra #gemini #ff #capricorn #jobs #taurus #aquarius #aries #pisces #fb #news #tweetmyjobs

  35. #nowplaying #ff #jobs #retweetthisif #bieberbemine #happybirthdayjustin #babyonitunes #biebcrbemine #justinbiebcr #fb #tweetmyjobs #damnhowtrue #followfriday #biebcrgasm #1 #grindmebieber #quote #news #retweetthis #followmejp

  36. Twitter Leaf nodes look promising, but Popular tags everywhere.

  37. LSHTC Kaggle competition Predict Wikipedia tags from documents (token counts)

  38. LSHTC

  39. Overall Statistical performance is good Computational performance is good

  40. Limitations Only works when linear classifier is good Linear routing node Using linear predictor of ? given ? Not deep!

  41. Next Steps Online learning version Statistical questions Deep routing nodes

  42. Summary Wrapper approach for accelerating extreme learning Leverages (super-scalable) eigenvalue strategy Good for text

Related


More Related Content