Modern Likelihood-Frequentist Inference: A Brief Overview

Slide Note
Embed
Share

The presentation by Donald A. Pierce and Ruggero Bellio delves into Modern Likelihood-Frequentist Inference, discussing its significance as an advancement in statistical theory and methods. They highlight the shift towards likelihood and sufficiency, complementing Neyman-Pearson theory. The talk covers the historical context, introducing the audience to the concepts of Neo-Fisherian theory of inference and the development of modern likelihood asymptotics. Key references in the field are also provided.


Uploaded on Oct 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Modern Likelihood-Frequentist Inference Donald A Pierce, Emeritus, OSU Statistics and Ruggero Bellio, Univ of Udine Slides and working paper, other things are at: http://www.science.oregonstate.edu/~piercedo Slides and paper only are at: https://www.dropbox.com/sh/fd6yqcfb2lfubyf/AAAfHspffPSfur6Qs7WJDTr9a?dl=0

  2. PURPOSE OF THIS TALK To summarize the Pierce & Bellio working paper Modern Likelihood- Frequentist Inference . Topic of that is an important advance in statistical theory and methods, due to many workers, largely occurring since 1986. Complement to Neyman-Pearson theory, based more on likelihood and sufficiency. Results considerably improve practically -- on the accuracy of usual first-order likelihood methods, such as the Wald and likelihood ratio chi-squared tests. Our paper provides an exposition of this topic intended for a wide audience of statisticians. It also introduces an R package likelihoodAsy , which I will describe here. 2

  3. Shortly before 1980, important developments in frequency theory of inference were in the air . Strictly, this was about new asymptotic methods, but with the capacity leading to what has been called Neo-Fisherian theory of inference. A complement to the Neyman-Pearson theory, emphasizing likelihood and conditioning for the reduction of data for inference, rather than direct focus on optimality, e.g. UMP tests 3

  4. How it all started, largely (there were earlier developments ) 4

  5. A few years after that, this pathbreaking paper led the way to remarkable further development of MODERN LIKELIHOOD ASYMPTOTICS That paper was difficult, so Dawn Peters and I had some success interpreting/promoting/extending it in an invited RSS discussion paper 5

  6. HIGHLY ABRIDGED REFERENCE LIST Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed likelihood ratio. Biometrika 73, 307-322. Durbin, J. (1980). Approximations for densities of sufficient estimators. Biometrika67, 311-333. Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information. Biometrika 65, 457-482. Pierce, D. A. and Peters, D. (1992). Practical use of higher-order asymptotics for multiparameter exponential families. J. Roy. Statist. Soc. B. 54, 701-737. Pierce, D.A. and Bellio, R. (in prep). Modern likelihood-frequentist inference. (Basis for this talk) Skovgaard, I. M. (1996). An explicit large-deviation approximation to one- parameter tests. Bernoulli 2, 145-165. 6

  7. Inference and Asymptotics (1994) Barndorff-Nielsen & Cox Principles of Statistical Inference from a Neo-Fisherian Perspective (1997) Pace & Salvan Likelihood Methods in Statistics (2000) Severini SOME MAJOR BOOKS R package likelihoodAsy, Available at http://cran.r-project.org/ SOFTWARE ANOUNCED IN WORKING PAPER Applies quite generally, requiring mainly only a user-provided R code for the likelihood function. Going well beyond exponential families, and even independent observations. 7

  8. Salvan (Univ Padua) and Pace & Bellio (Univ Udine) made it possible for me to visit 2-4 months/year from 2000 to 2016 to study Likelihood Asymptotics In 2012 they arranged for me a Fellowship at Padua, work under which led to the paper in progress discussed today This is based on the idea that the future of Likelihood Asymptotics will depend on: (a) development of generic computational tools and (b) concise and transparent exposition amenable to statistical theory courses. 8

  9. For a model with parameter and scalar interest parameter , write for the MLE s with and without constraint 0 ( ) = { , } ( ) The 1st-order LR test is based on a standard normal approximation to the signed root LR statistic ( ) 2{ ( ; ) r sign = ( ; )} l y l y The aim is to improve on the this through a modified LR statistic such that *r ( ) = = + * 1 { ( ) p r Y ( ): r y { (y)}{1 r ( )} O n 9

  10. To Fisher, optimality of inference involved sufficiency, more strongly than in the Neyman-Pearson theory But generally the MLE is not a sufficient statistic Thus to Fisher, and many others, the resolution of that was conditioning on an ancillary statistic to render the MLE sufficient beyond 1st order. Ancillary statistics carry information about the precision of the inference, but not the value of the parameter, e.g. the ratio of observed to expected Fisher information. 10

  11. A central concept in what follows involves observed and expected (Fisher) information. The observed information is defined as minus the second derivative of the loglikelihood at its maximum ( ; ) | j l y = = The expected information (more usual Fisher info) is defined as ( ) { ( ; )} i E l Y = i = ( ) i And we will write 11

  12. i j = The MLE is sufficient if and only if , and under regularity this occurs only for exponential families without nonlinear restriction on the parameter (full rank case) Inferentially it is unwise and not really necessary to use the average information it is more useful for planning With methods indicated here, it is feasible to condition on an ancillary statistic such as / (meaningactually a j i = i 1 ) j This is the key part of what is called Neo-Fisherian Inference 12

  13. Starting point is a simple and accurate likelihood ratio approximation to the distribution of the (multidimensional) maximum likelihood estimator Next step is to transform & marginalize from this to the distribution of the signed LR statistic (sqrt of usual statistic) --- requiring only a Jacobian and Laplace approximation to the integration 2 1 This result is expressed as an adjustment to the first-order N(0,1)distribution of the LR: If that approximation is poor but not terrible, this mops up most of the error (Rob Kass) This is not hard to fathom---accessible to a graduate level theory course---if one need not be distracted by arcane details 13

  14. ( , ) a Indeed, Skovgaard (1985) confirmed that in general is to sufficient, and conditioning on (among other choices) leads with that order to: (a) no loss of information , (b) the MLE being sufficient / j i = a (1/ ) O n P The LR approximation to the distribution of the MLE (usually but less usefully called the (or the magic) formula is then 1/2 * /2 ( | ; ) (2 ) ( | ; ) 1 pr a = + * p | ( )| j ( ; ) ( ; ) pr y pr y = pr a p 1 ( ) O n 14

  15. Though this took some years to emerge, in retrospect it becomes fairly simple: ( | ; ) ( | ; ) ( | ; ) ( | ; ) ( | ; ) a p p a a ( | ; ) a ( | ; ) a p p (2 ) ; ) ( | , ; ) ( | , ; ) ( ; ) ( ; ) ( ; ) ( p ( | ; ) p a p y p y p y p y p y a a p p a a ( | ; ) since is cond. suff to2 order a nd p = and with Edgeworth expansion to the final term p | ( )| j 1/2 1/2 this having relative error (1/ ) for all = + ( ) O n O n /2 p y = * The aim then is to transform this to the distribution of r 15

  16. * ( ) p The Jacobian and marginalization to be applied to involve rather arcane sample space derivatives 2 ( ) l | 1/2 j 1/2 = | { ( ; , ) P l = ( ; , )}/ | || , || | C j u a l a j | P T approximations to which are taken care of by the software we provide. The result is an adjusted LR statistic 1 log( ) r r C = + + = + + * 1 log / r r u r r NP INF such that { ( ) p r Y ( ) = = + * 1 ( ): r y { (y)}{1 r ( )} O n 16

  17. It was almost prohibitively difficult to differentiate the likelihood with respect to MLEs while holding fixed a (rather notional) ancillary statistic The approximations referred to came in a breakthrough by Skovgaard, making the theory practical Skovgaard s approximation uses projections involving covariances of likelihood quantities computed without holding fixed an ancillary Our software uses simulation for these covariances, NOT involving model fitting in simulation trials 17

  18. To use the generic software, the user specifies an R function for computing the likelihood. The package design render the it quite generally applicable. Since higher-order inference depends on more than the likelihood function, one defines the extra-likelihood aspects of the model by providing another R-function that generates a dataset. The interest parametric function is defined by one further R-function. We illustrate this with a Weibull example, and interest parameter the survival function at a given time and covariate 18

  19. Here there are 17 observations on {leukemia survival time, and one covariable log WBC} with a simple linear regression model for the log hazard function. Inference is on the survival probability at a given time and covariate value. We test the hypothesis that this probability is equal to the 1st order 0.975 lower confidence limit, against alternatives of smaller values. Results for 1st- and 2nd- order LR tests and Wald test are = = 1.66 ( 0.048) r P ( ) = = * 2.10 = 0.018 = r P 1.95 ( 0.025) Wald P 19

  20. 20

  21. Confidence Distributions: One-sided confidence limits at all possible levels. P-vals are one-tailed error probabilities from testing. 21

  22. There are 4 other examples in the paper, including inference on autocorrelation in AR-1, binomial overdispersion model, and other settings where one would ordinary use 1st-order asymptotics. The higher-order improvements are of practical interest. In the examples, for moderate sample sizes, P-values around 0.05 are modified by a factor of about 2 using higher-order asymptotics. I am aware that calling this Modern Frequentist-Likelihood Inference may presume that the methods here will be more widely used Our aim with the paper and R package is to contribute to that with exposition and software that applies widely. 22

Related