
Families of Probability Distributions and Examples
Explore common families of probability distributions such as Bernoulli and Binomial distributions through examples like Shaquille O'Neal's free throw shooting. Understand the concepts, calculations, and applications in probability theory.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Common Families of Probability Distributions
Bernoulli Distribution An experiment consists of one trial. It can result in one of 2 outcomes: Success or Failure (or a characteristic being Present or Absent). Probability of Success is (0 < < 1) Y = 1 if Success (Characteristic Present), 0 if not = = 1 0 y y = ( ) p y 1 1 = = ) 1 + = ( ) ( ) 0(1 yp y E Y = 0 y ( ) = ) 1 + = 2 2 2 0 (1 E Y ( ) 2 = = = 2 2 ( ) ( ) (1 ) V Y E Y E Y = (1 )
Example Shaquille ONeal NBA Free Throw Shooting Shaquille O Neal played in 1207 Career NBA games over 19 seasons Famous for being a very poor Free Throw Shooter Of 11252 Career Free Throw Attempts, he made 5935 = .5275 Shaquille O'Neal - Free Throws 1 0.9 0.8 0.7 0.6 0.5 = = { } .5275 { } .5275(1 .5275) V Y = E Y 0.4 = .2492 0.3 0.2 = .2492 .4992 0.1 0 Success Failure
Binomial Distribution Consider outcomes of an experiment with 3 Trials: 3 ( ) , , 2 , , 1 0 ( ) FFF y P FFF = = In General: = = = = = 3 ( 3) (3) SSS SSF SFS FSS SFF FSF FFS y P SSS P Y P SSF P SFF P Y p = = = = = = 2 ( ) ( P Y 2) = (2) (1) p 3 3 (1 (1 ) y SFS FSF = FSS FFS 0) = P Y = ) p = = 2 ( ) ( 1) ) y = 3 ( 0) ( (1 p n y ! n = s s 1) # of ways of arranging (and ( ) ) in a sequence of positions y S n y F n !( )! y n y n y s s y 2) Probability of each arrangement of (and ( ) ) (1 ) y S n y F n y n y = = = = y 3) ( ) ( ) (1 ) 0,1,..., P Y y p y y n ( ) = = = R Functions: ( ) dbinom( , , ) ( ) pbinom( , , ) p y y n F y P Y y y n
Binomial Experiment Experiment consists of a series of n identical trials Each trial can end in one of 2 outcomes: Success or Failure Trials are independent (outcome of one has no bearing on outcomes of others) Probability of Success, , is constant for all trials Random Variable Y, is the number of Successes in the n trials is said to follow Binomial Distribution with parameters n and Y can take on the values y=0,1, ,n Notation: Y~Bin(n, ) ( ) ( ) = = = { } E Y { } V Y 1 1 n n n
Example Shaquille ONeal Free Throws Suppose we observe Shaq take n = 15 Free Throws (or sample 15 from his population of attempts) Let Y be the number of Successful attempts in the 15 shots Compute: E{Y}, V{Y}, , P(Y 7), P(Y 10), and the Probability Distribution = = = = { } E Y 15(.5275) 7.9125 n ( ) = = = = { } V Y 1 15(.5275)(1 .5275) 3.7387 3.7387 1.9336 n 15 0 15 7 ( ) 15 0 15 8 = + + = 0 7 7 .5275 (1 .5275) ... .5275 (1 .5275) .4142 P Y ( ) ( ) = 1 .7932 = = 10 1 9 .2068 P Y P Y y 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 p(y) 0.0000 0.0002 0.0017 0.0083 0.0277 0.0680 0.1266 0.1817 0.2028 0.1761 0.1180 0.0599 0.0223 0.0057 0.0009 0.0001
Estimating From Sample Data ! n ( ) ( ) ( ) ( ) n y = = = y ~ , , 1 , 0,1,..., 0 1 Y Bin n p y n L y n y n ( ) ! ! y n y ^ to maximize (or equivalently L = Goal: Choose ln ) l L l y n y y n set = ^ ( ) ( ) n ( ) y ( ) ( ) ( ) ( ln 1 ) = = + + = = ln ln ! ln ! ln ! ln 0 l L n y y n y 1 = 1 e e + = = Alternative Representation (l ogit): ln + 1 1 1 e ^ 5935 11252 .5275 1 .5275 y n ^ ^ = = = = = = = = Shaq Data: 5935 11252 .5275 ln ln .1100 y n ^ 1
Estimating From Sample Data R Program ! n ( ) ( ) ( ) ( ) n y = = = 1 Estimating with R using glm function: y ~ , , 1 , 0,1,..., 0 Y Bin n p y n L y n y n ( ) ! ! y n y 0 s across the experimental units: n 1 , ) the numbers of successes and failures: #S #F glm(y ~ 1 , bin omial("logit")) glm(y ~ 1 , binomial("logit")) 1) is a variable made up of s and y 2) is a "row" vector with two numbers ( y y n y shaq.n <- 11252; shaq.y <- 5935 shaq.ft <- cbind(shaq.y, shaq.n - shaq.y) shaq.mod1 <- glm(shaq.ft ~ 1, binomial("logit")) summary(shaq.mod1) predict(shaq.mod1) # logit scale (a) predict(shaq.mod1, type="response") # response scale (p) > summary(shaq.mod1) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.10996 0.01888 5.823 5.78e-09 > predict(shaq.mod1) 1 0.1099578 > predict(shaq.mod1, type="response") 1 0.5274618 shaq.data <- c(rep(1,shaq.y), rep(0,shaq.n-shaq.y)) shaq.mod2 <- glm(shaq.data ~ 1, binomial("logit")) summary(shaq.mod2) predict(shaq.mod1) predict(shaq.mod1, type="response")
Poisson Distribution Distribution often used to model the number of incidences of some characteristic in time or space: Arrivals of customers in a queue Numbers of flaws in a roll of fabric Number of typos per page of text. Distribution obtained as follows: Break down the area into many small pieces (n pieces) Each piece can have only 0 or 1 occurrences ( =P(1)) Let =n Average number of occurrences over area Y # occurrences in area is sum of 0s & 1sover pieces Y ~ Bin(n, ) with = /n Take limit of Binomial Distribution as n with = /n y e = = = = = ( ) p y 0,1,2,...; 0 { } E Y { } V Y y ! y In R: ppois(y, ) gives F(y)=P(Y <= y) dpois(y, ) gives p(y) = P(Y=y)
Example German Football League - 2013 Total Goals Per Game (Both Teams) Mean=3.16 Variance=2.94 Comparison with Poisson( = 3) Compute P(Y=0), P(Y<=2), P(Y>3), and Probability Distribution (Observed and Theoretical) Probability Poisson(3) CDF observed theoretical Observed 0.0425 0.1111 0.2353 0.2222 0.1797 0.1144 0.0588 0.0196 0.0163 1.0000 Poisson(3) Count theoretical observed 0.0498 0.1991 0.4232 0.6472 0.8153 0.9161 0.9665 0.9881 1.0000 #N/A Poisson(3) df=9-1-1=7 theoretical Chi_square 15.23 45.70 68.56 68.56 51.42 30.85 15.43 6.61 3.64 306 Observed: P(Y > 3) = 1-P(Y 2) = 1-0.3889 = 0.6111 Theoretical: P(Y > 3) = 1-P(Y 2) = 1-0.4232 = 0.5768 y 0 1 2 3 4 5 6 7 8+ 0.0498 0.1494 0.2240 0.2240 0.1680 0.1008 0.0504 0.0216 0.0119 1.0000 0.0425 0.1536 0.3889 0.6111 0.7908 0.9052 0.9641 0.9837 1.0000 #N/A 13 34 72 68 55 35 18 6 5 306 0.3278 2.9974 0.1729 0.0045 0.2496 0.5581 0.4298 0.0564 0.5057 5.3023 Sum
Geman Football League 2013 - Observed and Poisson(3) 80 70 60 50 observed 40 expected 30 20 10 0 0 1 2 3 4 5 6 7 8+
Estimating from Sample Data y e ( ) ( ) p y = = ,..., ~ (independent) 0,1,... 0 Y Y Poi y 1 n ! y n n n y y y i y n i i n n n e e l set = ^ ( ) = 1 i i ( ) ( ) = = = = = + = + = = = = ,..., ,..., ln ln ! 0 1 1 n i i p y y L y y l L n y y n y 1 1 n n i i n ! y = = = 1 1 i i ! 1 y i i i = 1 i ( ) = = Alternative Representat ion: ln e 967 306 n ^ ^ ( ) = = = = = = Germany Goal Data: 306 967 3.1601 ln 3.1601 1.1506 n y i = 1 i > summary(gg.mod) gg.freq <- c(13,34,72,68,55,35,18,6,5) gg.goals <- rep(0:8, gg.freq) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.15061 0.03216 35.78 <2e-16 gg.mod <- glm(gg.goals ~ 1, poisson("log")) summary(gg.mod) predict(gg.mod)[1] # Print out just first case predict(gg.mod, type="response")[1] > predict(gg.mod)[1] 1 1.150613 > predict(gg.mod, type="response")[1] 1 3.160131
Negative Binomial Distribution Used to model the number of trials needed until the rth Success (extension of Geometric distribution) Based on there being r-1 Successes in first y-1 trials, followed by a Success ? 1 ? 1 (? 1)! ??(1 ?)? ? ?(?) = (? 1)!(? ?)!??(1 ?)? ?= = (?) (?) (? ?+1)!??(1 ?)? ? = ? = ?,? + 1,... 0<p<1 ?(?) =? ?(?) =?(1 ?) Obtaining Probabilities in R: p(y) = dnbinom(y,r,p) F(y) = P(Y y) = dnbinom(y,r,p) 2
Negative Binomial Distribution (II) = Generaliza tion to " domain" of * 1 , 0 ,... y * k y + + ( * ) y k * k + = = ( *) * 1 , 0 ,... p y y ) 1 + ( ) ( k y k k where : + k + = = = = * 1 k r y y r p p k k 2 = = + ( *) ( *) E Y V Y k This model is widely used to model count data when the Poisson model does not fit well due to over-dispersion: V(Y) > E(Y). In this model, k is not assumed to be integer-valued and must be estimated via maximum likelihood (or method of moments)
Negative Binomial Distribution Estimating Parameters = Generalization to "domain" of * 0,1,... y 1 * y + + + 1 1 2 ( * y ) + = = = = + ( *) p y * 0,1,... ( *) E Y ( *) V Y y ) ( * 1) y 1 1 1 1 ( No closed form solutions , must use iterative methods if solving dire k ( ) R provides estimates of ln , (Dispersion parameter) with ctly function in glm.nb 1 MASS package. > summary(gg.mod.nb) library(MASS) gg.mod.nb <- glm.nb(gg.goals ~ 1) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.15061 0.03216 35.78 <2e-16 (Dispersion parameter for Negative Binomial(34400.78) family taken to be 1) summary(gg.mod.nb) Theta: 34401 Std. Err.: 409434
Normal (Gaussian) Distribution Bell-shaped distribution with tendency for individuals to clump around the group median/mean Used to model many biological phenomena Many estimators have approximate normal sampling distributions (see Central Limit Theorem) 2 1 ( 2 ) y 1 = 2 ( ) f y , , 0 e y 2 2 Obtaining Density and Probabilities in R: f(Y) = dnorm(y, , ) F(y) = P(Y y) = pnorm(y, , )
Normal Distribution Density Functions (pdf) Normal Densities 0.045 0.04 0.035 0.03 N(100,400) 0.025 N(100,100) f(y) N(100,900) N(75,400) 0.02 N(125,400) 0.015 0.01 0.005 0 0 20 40 60 80 100 120 140 160 180 200 y
Data Description Body Mass Index: BMI = 703*Weight(lbs)/(Height(in))2 WNBA (Females): 139 w/ Mean=23.135, SD=2.105 NBA (Males): 505 w/ Mean=24.741, SD=1.720 Distributions are approximately normal
WNBA and NBA BMI Distributions 0.25 Females: F = 23.135 F = 2.105 Males: M = 24.741 M = 1.720 0.2 0.15 Normal Density f(y_F) f(y_M) 0.1 0.05 0 15 18 21 24 27 30 Body Mass Index
Probability and Quantile Calculations ( ) ( ) = = = = 2 F 2 2 M 2 ~ 23.135, 2.105 ~ 24.741, 1.720 Y N Y N F F M M 24.00 23.135 2.105 Y ( ) = = = = 24.00 0.41 .3409 P Y P Z F F F F F 24.00 24.741 Y ( ) ( ) ( ) = = = = = 1 .3336 = = 24.00 0.43 1 0.43 1 0.43 .6664 P Y P Z P Z P Z M M M M M M 1.720 M 95th-Per ( F P Y centile (0.95th-quantile) for Females: .95 1 q P Y Y P Z = = ) ( ) ( ) = = = 1.645 .0500 (From Z-table and interpolation) q P Z .95 .95 F F 23.135 2.105 ( ) ( ) ( ) = 23.135 1.645 2.105 + = = 1.645 1.645 26.60 .05 P Z P Y F F F F 10th-Percentile ( ( M P Y 0.10th-quantile for Males) .10 M P Y Y P Z = = ) ( ) ( ) ( ) = = = = 1.282 1.282 .1000 (From Z-table and interpolation) q q P Z P Z .10 .90 M M 24.741 1.720 ( ) ( ) ( ) = 24.741 1.282 1.720 = = 1.282 1.282 22.54 .10 P Z P Y M M M M Normal Probabilities BMI\Gender F >24 <24 Total Note: If we used >24 vs <24 as a classifier between Males and Females, about 2/3 of Males and 2/3 of Females would be classified correctly M 0.3409 0.6591 0.6664 0.3336 1 1
Other Choices of Cut-Off Values Cut-Off Z_F Z_M P(F<CO) 0.0682 0.1552 0.2949 0.4744 0.6594 0.8122 0.9133 0.9668 0.9896 P(F>CO) 0.9318 0.8448 0.7051 0.5256 0.3406 0.1878 0.0867 0.0332 0.0104 P(M<CO) P(M>CO) CorrectF 0.0029 0.9971 0.0148 0.9852 0.0555 0.9445 0.1557 0.8443 0.3333 0.6667 0.5598 0.4402 0.7679 0.2321 0.9055 0.0945 0.9709 0.0291 FalseM 0.9318 0.8448 0.7051 0.5256 0.3406 0.1878 0.0867 0.0332 0.0104 FalseF CorrectM 0.9971 0.9852 0.9445 0.8443 0.6667 0.4402 0.2321 0.0945 0.0291 20 21 22 23 24 25 26 27 28 -1.4893 -1.0143 -0.5392 -0.0641 0.4109 0.8860 1.3610 1.8361 2.3112 -2.7564 -2.1750 -1.5936 -1.0122 -0.4308 0.1506 0.7320 1.3134 1.8948 0.0682 0.1552 0.2949 0.4744 0.6594 0.8122 0.9133 0.9668 0.9896 0.0029 0.0148 0.0555 0.1557 0.3333 0.5598 0.7679 0.9055 0.9709 23.135 2.105 24.741 1.720 CO CO = = In this table: and Z Z F M If we make the cut-off very low (say BMI=20), we get very accurate test for Males (.9971 Correct), but very inaccurate test for Females (.0682) correct. Similarly, if we make the cut-off very high (say BMI=28), we get very accurate test for Females (.9896 correct), but very inaccurate for Males (.0291 correct) This situation is very similar to diagnostic tests for patients for a disease
Prior/Posterior Probabilities, Odds, Likelihood Ratios In this population of professional basketball players, there are: 139 Females and 505 Males (644 Total). represents having a BMI above the cut-off Value, and testing "Positive" as being Male T+ 139 obabilities: .2158 644 .2158 Prior Odds: 1 .7842 p 505 644 ( ) ( ) = = = = Prior Pr .7842 P F P M .7842 .2158 p ( ) ( ) = = = = = .2752 3.6339 odds odds F odds M ( ( ) ) + | P T P T M F ( ) + = Likelihood Ratio of a Positive Test: LR T + | ( ( ) ) | P T P T F M ( ) = Likelihood Ratio of a Negative Test : LR T | Posterior odds given a Positive Test (similar for a negative test): ( ( ) ) ( ( ) ) + | | P T P T M F P T P T F M ( ) ( ) ( ( ) P F ) ( ) ( P M P M P F ( ) ( ) ( ) ( ) + + = = = = odds M T odds M LR T odds F T odds F LR T + ) | | Posterior Probabilities given a Positive Test (similar for a negative test): ( ) + ( ) + odds M T odds F T ( ) ( ) odds odds + + = = = p P M T P F T ( ) ( ) 1 + + 1 1 odds M T odds F T
Computations Cut-Off P(F) P(M) odds(F) 0.2752 0.2752 0.2752 0.2752 0.2752 0.2752 0.2752 0.2752 0.2752 odds(M) P(T+|F) 3.6331 3.6331 3.6331 3.6331 3.6331 3.6331 3.6331 3.6331 3.6331 P(T+|M) 0.9971 0.9852 0.9445 0.8443 0.6667 0.4402 0.2321 0.0945 0.0291 LR(T+) odds(M|T+) P(M|T+) 20 21 22 23 24 25 26 27 28 0.2158 0.2158 0.2158 0.2158 0.2158 0.2158 0.2158 0.2158 0.2158 0.7842 0.7842 0.7842 0.7842 0.7842 0.7842 0.7842 0.7842 0.7842 0.9318 0.8448 0.7051 0.5256 0.3406 0.1878 0.0867 0.0332 0.0104 1.0701 1.1662 1.3395 1.6064 1.9576 2.3436 2.6754 2.8497 2.7912 3.8876 4.2370 4.8664 5.8363 7.1123 8.5144 9.7200 10.3533 10.1407 0.7954 0.8091 0.8295 0.8537 0.8767 0.8949 0.9067 0.9119 0.9102 Alternative Calculation using Law of Total Probability and Bayes' Rule (CO = 24): .2158 .7842 | P F P M P T F = = ( ) ( ) ( ) ( ) + + = = .3406 | .6667 P T M ( ) ) ( ( P M P T P T ) ) ( ) ( ( ( ) ( ) + + + = + = + = | | .2158 .3406 .7842 .6667 .5963 P T P F P T F P M P T M ) ( ( ) ( ) + | M .7842 .6667 .5963 ( ) + = = = | .8767 P M T ) +
Receiver Operating Characteristic (ROC) Curve - BMI Classify as M/F 1.000 0.900 0.800 0.700 Sensitivity = P(True +) = P(T+|M) 0.600 0.500 True+ 45DegLine 0.400 0.300 0.200 0.100 0.000 0.000 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 1-Specificity = P(False +) = P(T+|F)
Performance of BMI as Test for M/F An excellent test would have a high arc to the Northwest corner of the graph, allowing for a high sensitivity, P(T+|M) along with a low 1-specificity, P(T+|F) Clearly, this test does not perform particularly well (due to large overlap in the Male/Female BMI densities Commonly reported measure is the Area Under the ROC Curve (AUC) 0.5 AUC 1 Rule of Thumb: 0.9-1 = Excellent, 0.8-0.9 = Good, 0.7-0.8 = Fair, 0.6-0.7 = Poor, 0.5-0.6 = Fail For this Test, AUC = 0.6621 (applying trapezoidal rule) b a ( ) b ( ) x dx ( ) ( ) ( ) ( ) ( ) x + + + + + = = = = 2 ... 2 with 0, 1, 197, | f f x f x f x f x a b n f P T M 0 1 1 n n 2 n a
Gamma Distribution Family of Right-Skewed Distributions Random Variable can take on positive values only Used to model many biological and economic characteristics Can take on many different shapes to match empirical data 1 0, , 1 y 0 y e y ( ) ( ) = ( ) = = 1 y Model 1: (EXCEL) ( ) 1 ( 1) f y y e dy 0 0 otherwise 0, , 1 y 0 y e y ( ) ( ) = ( ) = = 1 y Model 2: (R) ( ) 1 ( 1) f y y e dy 0 0 otherwise Obtaining Density and Probabilities in R: f(Y) = dgamma(y, , ) F(y) = P(Y y) = pgamma(y, , )
Gamma Distribution Mean and Variance ( ) = = = = = 1 1 2 y y y Model 1: ( ) Let 1 y e dy u y dv e dy du y v e 0 ( ) ( ) ( ) 1) 1 ( ) = = + = 1 2 ( y y y 1 0 1 1 ( 1) y e y e dy y e dy 0 0 0 1 y du dy = = = 1 / y Consider Let y e dy u dy du 0 1 ( ) ( ) f y 1 = = = ( ) = 1 / 1 1 / y u u y y e dy u e du u e du y e ( ) 0 0 0 + + 1 1 1 ( 1) E Y 1) 1 + = = = = / ( / y y y e dy y e dy ( ) ( ) 1 ( ) ( ( ) 1 0 0 + + 2 ( 2) E Y ( ) + + 2) 1 = = = = + 2 1 / ( / 2 y y 1 y e dy y e dy ( ) ( ) 0 0 ) ( ) 2 = + = 2 2 1 V Y E Y = = Model 2: V Y 2
Gamma/Exponential Densities (pdf) Exponential and Gamma density functions 0.5 0.4 0.3 exp(2.0) exp(5.0) gam(2,2) gam(2,3) gam(3,2) f(y) 0.2 0.1 0 0 2 4 6 8 10 y
Lognormal Distribution 2 1 log 2 y 1 y 0, , 0 e y 2 2 2 = ( ) f y 0 otherwise ( ) = * 2 Note: ln( ) ~ Y , Y N ( ) 1 2 2 2 ( ) ( ) 2 + 2 = = = = + = * Y ( ) ( 1) exp (1) E Y E e M t e * Y ( ) 2 2 ) ( 2 2 ( ) ( ) ( ) ( ) 2 2 + 2 = = = = = + = 2 * 2 * Y Y ( 2) exp (2) E Y E e E e M t e * Y ( ) ( ) 2 + 2 2 2 + = = 2 2 ( ) ( ) V Y E Y E Y e e Obtaining Density and Probabilities in R: f(Y) = dlnorm(y, , ) F(y) = P(Y y) = plnorm(y, , )
Lognormal pdfs Lognormal pdf's 1.2 1 0.8 LN(0,1) LN(0,4) LN(1,1) LN(1,4) f(y) 0.6 0.4 0.2 0 0 1 2 3 4 5 6 7 8 y
Data Description / Distributions Miles per Hour for 2499 people completing the marathon (1454 Males, 1045 Females) Males: Mean=6.337, SD=1.058, Min=4.288, Max=10.289, P(YM 7) = .7538 Females: Mean=5.840, SD=0.831, Min=4.278, Max=8.963, P(YF 7) = .8986
Method of Moments Estimators - Gamma Obtain the Sample Mean and Variance and Use them to obtain estimates of parameters and P(Y 7) Gamma Distribution: E Y = = V Y 2 ( ) Y S 2 ( ) V Y 2 E Y E Y V Y Y S ~ ~ E Y = = = = = 2 Y 2 Y ( ) 2 ( ( ) ) 2 Y 6.337 1.058 6.337 1.058 Y S ~ ~ M M = = = = = = Males: 5.661 35.896 M ( ) M 2 2 2 Y 2 Y S M M R: pgamma(7, 35.896, 5.661) returns 0.7446 ( ) 2 ( ( ) ) 2 Y 5.840 0.831 5.84 0.831 0 Y S ~ ~ F F = =( = = = = Females: 8.457 49.381 F ) F 2 2 2 Y 2 Y S F F R: pgamma(7, 49.381, 8.457) returns 0.9132
Method of Moments Estimators - Lognormal 2 ( ) + 2 ) + 2 2 + = = 2 Lognormal Distribution: E Y e V Y e e 2 ( ) ) 2 + V Y E Y V Y 2 ( ) E Y 2 2 = + = = ln 1 e e ( ) ) ( E Y 2 2 2 E Y ( ( ) ) 2 2 + + V Y E Y V Y E Y 1 2 ( ) E Y = = 2 ln ln ln ( ( E Y ( 1.058 E Y 2 2 ( ) 2 ( ) 2 2 2 + + 2 Y 2 Y S Y S Y 2 ( ) Y 1 2 ~ ~ = = ln ln ln ( ) Y ( ) Y ) 6.337 ( ) 2 2 + 2 6.337 1 2 ~ ~ ( ) ( ) = = = = Males: ln 0.02748 ln 6.337 0.02748 1.83266 M ( ) M 2 R: plnorm(7, 1.83266, sqrt(0.02748)) returns 0.752751 + = R: plnorm(7, 1.75468, sqrt(0.02005)) returns 0.911575 ( ) 5.840 ( ) 2 2 2 0.831 5.840 1 2 ~ ~ ( ) ( ) = = = Females: ln 0.02005 ln 5.840 0.02005 1 .75468 F ( ) F 2
Beta Distribution Used to model probabilities (can be generalized to any finite, positive range) Parameters allow a wide range of shapes to model empirical data 1 1 1 (1 ) 0 1, , 0 y y y + ( ) B = ( ) f y 0 otherwise ( ) ( ) ( + 1 + = = 1 1 ( ) (1 ) B y y dy ) 0 ( ) ( ) ( ) ( ) 1 1 + + ) ( ) 2 2 + + + + + + + + 1 E Y 1) 1 + = = = ( 1 (1 ) y y dy ( ) ( ) ( ( ) ( ) ( ) ( ) ( ( ) ( ) ( = ( ( ) 0 ) ) ( )( ) 1 + + 1 + + E Y 1 + 2) 1 = = = 2 ( 1 (1 ) y y dy ( ) ( ) ( )( + 0 ) 1 )( ) ( ) 2 + + + + + + + + 2 + 1 + + 1 1 = = V Y ( ) ( ) ( ) ( ) ( ) + 2 2 + + + 1 1 Obtaining Density and Probabilities in R: f(Y) = dbeta(y, , ) F(y) = P(Y y) = pbeta(y, , )
Beta Density Functions (pdf) Beta Density Functions 4.5 4 3.5 3 Beta(1,1) 2.5 Beta(2,2) f(y) Beta(4,1) Beta(1,3) 2 Beta(5,5) 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 y
Data Description All NBA 2016-2017 Regular Season Team/Games Proportion of 3-Point Shots Made (Attempts range from 7-61 per team/game) Mean = .3566 SD = .0946 Var=.008958 + + ( ) = + = = = = Let 1 ( ) ( ) ( ) + 1 1 1 = = = = 2 1 ( ) ( ) ( ) = ) ( 2 2 2 1 1 + + + 1 ( ) 1 Y Y .3566 1 .3566 .008958 ~ = = 1 1 24.61 2 Y S ( ) ~ ~ ~ ~ = = = = = 24.61(1 .3566) 15.83 = 24.61(.3566) 8.78 1 Y Y