Understanding Bayes Theorem in NLP: Examples and Applications

Slide Note
Embed
Share

Introduction to Bayes Theorem in Natural Language Processing (NLP) with detailed examples and applications. Explains how Bayes Theorem is used to calculate probabilities in diagnostic tests and to analyze various scenarios such as disease prediction and feature identification. Covers the concept of prior and posterior probabilities, along with a breakdown of calculations using the theorem. Includes a practical example from Ray Mooney showcasing probabilities in different health conditions.


Uploaded on Sep 12, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. NLP

  2. Introduction to NLP Bayes Theorem

  3. Bayes Theorem Formula for joint probability p(A,B) = p(B|A)p(A) p(A,B) = p(A|B)p(B) Therefore p(B|A) = p(A|B)p(B)/p(A) Bayes theorem is used to calculate P(A|B) given P(B|A)

  4. Example Diagnostic test Test accuracy p(positive | disease) = 0.05 false positive p(negative | disease) = 0.05 false negative So: p(positive | disease) = 1-0.05 = 0.95 Same for p(negative | disease) In general the rates of false positives and false negatives are different

  5. Example Diagnostic test with errors A=TEST P(A|B) Positive 0.95 0.05 Negative 0.05 0.95 Yes No B=DISEASE

  6. Example What is p(disease | positive)? P(disease|positive) = P(positive|disease)*P(disease)/P(positive) P( disease|positive) = P(positive| disease)*P( disease)/P(positive) P(disease|positive)/P( disease|positive) = ? We don t really care about p(positive) as long as it is not zero, we can divide both sides by this quantity

  7. Example P(disease|positive) / P( disease|positive) = (P(positive|disease) x P(disease))/(P(positive| disease) x P( disease)) Suppose P(disease) = 0.001 so P( disease) = 0.999 P(disease|positive) / P( disease|positive) = (0.95 x 0.001)/(0.05 x 0.999) =0.019 P(disease|positive) + P( disease|positive) = 1 P(disease|positive) 0.02 Notes P(disease) is called the prior probability P(disease|positive) is called the posterior probability In this example the posterior is 20 times larger than the prior

  8. Example p(well)=0.9, p(cold)=0.05, p(allergy)=0.05 p(sneeze|well)=0.1 p(sneeze|cold)=0.9 p(sneeze|allergy)=0.9 p(cough|well)=0.1 p(cough|cold)=0.8 p(cough|allergy)=0.7 p(fever|well)=0.01 p(fever|cold)=0.7 p(fever|allergy)=0.4 Example from Ray Mooney

  9. Example (contd) Features: sneeze, cough, no fever P(well|e)=(.9) * (.1)(.1)(.99) / p(e)=0.0089/p(e) P(cold|e)=(.05) * (.9)(.8)(.3) / p(e)=0.01/p(e) P(allergy|e)=(.05) * (.9)(.7)(.6) / p(e)=0.019/p(e) P(e) = 0.0089+0.01+0.019=0.379 P(well|e)=.23 P(cold|e)=.26 P(allergy|e)=.50

  10. Bayes Theorem Hypothesis space: H={H1 , ,Hn} Evidence: E P E H P H ( | ) ( ) = = P H E ( | ) i i i P E ( ) In text classification: H: class space; E: data (features) If we want to pick the most likely hypothesis H*, we can drop P(E) Posterior probability of Hi Prior probability of Hi ( ) i H P H ( | ) ( | ) P H E P E i i Likelihood of data/evidence if Hi is true [slide from Qiaozhu Mei]

  11. Example: An Unfair Die It s more likely to get a 6 and less likely to get a 1 p(6) > p(1) How likely? What if you toss the die 1000 times, and observe 6 501 times, 1 108 times? p(6) = 501/1000 = 0.501 p(1) = 108/1000 = 0.108 As simple as counting, but principled maximum likelihood estimate [slide from Qiaozhu Mei]

  12. What if the Die has More Faces? Suitable to represent documents Every face corresponds to a word in vocabulary The author tosses a die to write a word Apparently, an unfair die [slide from Qiaozhu Mei]

  13. NLP

Related