
Understanding Tail Bounds and Markov's Inequality
Tail bounds and Markov's Inequality are crucial concepts in probability theory and statistics. Tail bounds help us approximate probabilities in the tails of distributions, while Markov's Inequality provides a bound on the probability that a non-negative random variable exceeds a certain value. Learn more about these concepts and their applications in this comprehensive guide.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Tail Bounds CSE 312 Summer 21 Lecture 19
Announcements Point values for Question 4 in Problem Set 6 has been updated. The Distinct Elements question has been updated to direct you to the correct page on the textbook. Look for the CDF on page 334.
Whats a Tail Bound? When we were finding our margin of error, we didn t need an exact calculation of the probability. We needed an inequality: the probability of being outside the margin of error was at most at most 5% (the example discussed mentioned that most of the data lied within the margin of error at least 95% of the time). A tail bound (or concentration inequality) is a statement that bounds the probability in the tails of the distribution (says there s very little probability far from the center) or (equivalently) says that the probability is concentrated near the expectation.
Our First bound Two statements are equivalent. Left form is often easier to use. Right form is more intuitive. Markov s Inequality Markov s Inequality Let ? be a random variable supported (only) on non-negative numbers. For any ? > ? ? ? ?[?] Let ? be a random variable supported (only) on non-negative numbers. For any ? > ? ? ??[?] ? ? ? To apply this bound you only need to know: 1. it s non-negative 2. Its expectation.
Proof ? ? = ? ? (? = ?) = ? (? = ?) + ? (? = ?) ?:? ? ?:?<? ? (? = ?) + 0 ? 0 whenever ? = ? > 0 ?:? ? Markov s Inequality ? ? = ? ?:? ? Let ? be a random variable supported (only) on non-negative numbers. For any ? > ? ? ? ?[?] = ? ? = ? ?:? ? = ? (? ?) ?
Example with geometric RV Suppose you roll a fair (6-sided) die until you see a 6. Let ? be the number of rolls. Bound the probability that ? 12 Markov s Inequality Let ? be a random variable supported (only) on non-negative numbers. For any ? > ? ? ? ?[?] ?
Example with geometric RV Suppose you roll a fair (6-sided) die until you see a 6. Let ? be the number of rolls. Bound the probability that ? 12 ? 12 ? ? 6 12=1 2. 12= Markov s Inequality Let ? be a random variable supported (only) on non-negative numbers. For any ? > ? ? ? ?[?] Exact probability? 1 ? < 12 1 0.865 = 0.135 ?
A Second Example Suppose the average number of ads you see on a website is 25. Give an upper bound on the probability of seeing a website with 75 ads. 75 or more Markov s Inequality Let ? be a random variable supported (only) on non-negative numbers. For any ? > ? ? ? ?[?] Fill out the poll everywhere so Kushal knows how long to explain Go to pollev.com/cse312su21 ?
A Second Example Suppose the average number of ads you see on a website is 25. Give an upper bound on the probability of seeing a website with 75 ads. 75 or more ? 75 ? ? 75=25 75=1 3 Markov s Inequality Let ? be a random variable supported (only) on non-negative numbers. For any ? > ? ? ? ?[?] ?
Useless Example Suppose the average number of ads you see on a website is 25. Give an upper bound on the probability of seeing a website with 20 ads. 20 or more Markov s Inequality Let ? be a random variable supported (only) on non-negative numbers. For any ? > ? ? ? ?[?] Fill out the poll everywhere so Kushal knows how long to explain Go to pollev.com/cse312su21 ?
Useless Example Suppose the average number of ads you see on a website is 25. Give an upper bound on the probability of seeing a website with 20 ads. 20 or more ? 20 ? ? 20=25 20= 1.25 Well, that s true. Technically. But without more information we couldn t hope to do much better. What if every page gives exactly 25 ads? Then the probability really is 1.
Sowhat do we do? A better inequality! We re trying to bound the tails of the distribution. What parameter of a random variable describes the tails? The variance!
Chebyshevs Inequality Two statements are equivalent. Left form is often easier to use. Right form is more intuitive. Chebyshev s Inequality Chebyshev s Inequality Let ? be a random variable. For any ? > ? ? ???(?) Let ? be a random variable. For any ? > ? ? ?? ? ? ? ? ??? ? ? ? ? ??
Proof of Chebyshev Chebyshev s Inequality Let ? be a random variable. For any ? > ? ? ???(?) ? ? ? ?? Let ? = ? ? ? Markov s Inequality ? ? = 0 ? ?2 ?2 ? ?2 ? ? 2 =Var ? =Var(?) |?| ? = ?2 ?2 = ?2 ?2 ?2 Inequalities are equivalent (square each side). ? is just ? shifted. Variance is unchanged.
Example with geometric RV (again) Suppose you roll a fair (6-sided) die until you see a 6. Let ? be the number of rolls. Bound the probability that ? 12 Chebyshev s Inequality Let ? be a random variable. For any ? > ? ? ???(?) ? ? ? ??
Example with geometric RV (again) Suppose you roll a fair (6-sided) die until you see a 6. Let ? be the number of rolls. Bound the probability that ? 12 5/6 1/36 62=5 ? 12 ? 6 6 6 Chebyshev s Inequality Let ? be a random variable. For any ? > ? ? ???(?) Not any better than Markov ? ? ? ??
Example with geometric RV (diff bound) Let ? be a geometric rv with parameter ? Bound the probability that ? 2 ? 1 ? ?2 1 2 ? ? 1 1 ? ?2= 1 ? ? ? Markov gives: Chebyshev s Inequality 2 ? ? ? 1 ? ? 2=1 2. ? = Let ? be a random variable. For any ? > ? ? ???(?) 2 ? For large ?, Chebyshev is better. ? ? ? ??
Better Example Suppose the average number of ads you see on a website is 25. And the variance of the number of ads is 16. Give an upper bound on the probability of seeing a website with 30 30 or more ads. Chebyshev s Inequality Let ? be a random variable. For any ? > ? ? ???(?) Fill out the poll everywhere so Kushal knows how long to explain Go to pollev.com/cse312su21 ? ? ? ??
Better Example Suppose the average number of ads you see on a website is 25. And the variance of the number of ads is 16. Give an upper bound on the probability of seeing a website with 30 30 or more ads. ? 30 = (? 25 5) ? 25 5 16 25
Near the mean Suppose you run a poll of 1000 people where in the true population 60% of the population supports you. What is the probability that the poll is not within 10-percentage-points of the true value? Chebyshev s Inequality Let ? be a random variable. For any ? > ? ? ???(?) ? ? ? ??
Near the mean Suppose you run a poll of 1000 people where in the true population 60% of the population supports you. What is the probability that the poll is not within 10-percentage-points of the true value? ?? 1000 ? = Chebyshev s Inequality 0.6 1000=3 ? ? = 1000 5 Let ? be a random variable. For any ? > ? ? ???(?) Var ? = 1000 0.6 0.4 3 10002= 12500 ? ? ? ??
Near the mean Suppose you run a poll of 1000 people where in the true population 60% of the population supports you. What is the probability that the poll is not within 10-percentage-points of the true value? ?? 1000 ? = Chebyshev s Inequality 0.6 1000=3 ? ? = 1000 5 Let ? be a random variable. For any ? > ? ? ???(?) Var ? = 1000 0.6 0.4 3 10002= 0.1 3/12500 12500 0.12= 0.024 ? ? ? ? ? ? ??
Chebyshevs Repeated Experiments How many coin flips (each head with probability ?) are needed until you get ? heads? Let ? be the number necessary. What is probability ? 2? ?? Markov Chebyshev
Chebyshevs Repeated Experiments How many coin flips (each head with probability ?) are needed until you get ? heads? Let ? be the number necessary. What is probability ? 2? ?? ? 2? ?/? 2?/?=1 Markov ? 2 ?2/?2=?(1 ?)/?2 ? 2? ? ? ? ? Var(?) =1 ? Chebyshev ?2/?2 ? ? ?
Takeaway Chebyshev gets more powerful as the variance shrinks. Repeated experiments are a great way to cause that to happen.