Modern Likelihood-Frequentist Inference: A Brief Overview

Modern Likelihood-Frequentist
Inference
Donald A Pierce, Emeritus, OSU Statistics
and
Ruggero Bellio, Univ of Udine
2
   
PURPOSE OF THIS TALK
To summarize the Pierce & Bellio working paper “Modern Likelihood-
Frequentist Inference”.  Topic of that is an important advance in
statistical theory and methods, due to many workers, largely occurring
since 1986. Complement to Neyman-Pearson theory, based more on
likelihood and sufficiency.  Results considerably improve – practically --
on the accuracy of usual first-order likelihood methods, such as the Wald
and likelihood ratio chi-squared tests.
Our paper provides an exposition of this topic intended for a wide
audience of statisticians.
It also introduces an 
R
 package 
likelihoodAsy 
, which I will describe
here.
Shortly before 1980, important developments in
frequency theory of inference were  “in the air”.
Strictly, this was about new asymptotic methods, but
with the capacity leading to what has been called
“Neo-Fisherian” theory of inference.
A complement to the Neyman-Pearson theory,
emphasizing likelihood and conditioning for the
reduction of data for inference, rather than direct
focus on optimality, e.g. UMP tests
3
4
How it all started, largely (there were earlier developments )
A few years after that, this pathbreaking paper led the way to
remarkable further development of 
MODERN LIKELIHOOD
ASYMPTOTICS
That paper was difficult, so Dawn Peters and I had some success
interpreting/promoting/extending it in an invited RSS discussion
paper
5
 
  
HIGHLY ABRIDGED REFERENCE LIST
Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters
 
based on the standardized signed likelihood ratio. 
Biometrika 
73
,
 
307-322.
Durbin, J. (1980). Approximations for densities of sufficient estimators.
 
Biometrika
 
67
, 311-333.
Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum
 
likelihood estimator: Observed versus expected Fisher information.
 
Biometrika 
65
, 457-482.
Pierce, D. A. and Peters, D. (1992). Practical use of higher-order
 
asymptotics for multiparameter exponential families. 
J. Roy.
 
Statist. Soc. B. 
54
, 
701-737.
Pierce, D.A. and Bellio, R. (in prep). Modern likelihood-frequentist
 
inference. 
(Basis for this talk)
Skovgaard, I. M. (1996). An explicit large-deviation approximation to one-
 
parameter tests. 
Bernoulli 
2
, 145-165.
6
   
SOME MAJOR BOOKS
Inference and Asymptotics 
(1994) Barndorff-Nielsen & Cox
Principles of Statistical Inference from a Neo-Fisherian Perspective 
(1997)
 
Pace & Salvan
Likelihood Methods in Statistics
 (2000) Severini
   
SOFTWARE ANOUNCED IN WORKING PAPER
R package  likelihoodAsy
 
 ,    Available at
http://cran.r-project.org/
Applies quite generally, requiring mainly only a user-provided R code for
the likelihood function.
Going well beyond exponential families, and even independent
observations.
7
Salvan (Univ Padua) and Pace & Bellio (Univ Udine) made
it possible for me to visit 2-4 months/year from 2000 to
2016 to study Likelihood Asymptotics
In 2012 they arranged for me a Fellowship at Padua, work
under which led to the paper in progress discussed today
This is based on the idea that the future of Likelihood
Asymptotics will depend on: (a) development of generic
computational tools and (b) concise and transparent
exposition amenable to statistical theory courses.
8
9
For a model with parameter      and 
scalar
 interest parameter
          , write               for the MLE’s with and without constraint
The 1
st
-order LR test is based on a standard normal
approximation to the signed root LR statistic
The aim is to improve on the this through a modified LR
statistic       such that
10
To Fisher, “optimality” of inference involved sufficiency,
more  strongly than in the Neyman-Pearson theory
But generally the MLE is not a sufficient statistic
Thus to Fisher, and many others, the resolution of that
was conditioning on an 
ancillary statistic 
to render the
MLE sufficient beyond 1
st
 order.
Ancillary statistics carry information about the 
precision
of the inference, but not the 
value
 of the parameter, e.g.
the ratio of 
observed
 to 
expected
 Fisher information.
11
A central concept in what follows involves 
observed
and 
expected
 (Fisher) information.
The 
observed
 information is defined as minus the
second derivative of the loglikelihood at its maximum
The 
expected
 information (more usual Fisher info) is
defined as
And we will write  
  
12
The MLE is sufficient if and only if             , and under
regularity this occurs only for exponential families without
nonlinear restriction on the parameter (full rank case)
Inferentially it is unwise and not really necessary to use the
average information – it is more useful for planning
With methods indicated here, it is feasible to condition on
an 
ancillary statistic  
such as
This is the key part of what is called 
Neo-Fisherian Inference
Starting point is a simple and accurate ‘likelihood ratio
approximation’ to the distribution of the (multidimensional)
maximum likelihood estimator
Next step is to transform & marginalize from this to the
distribution of the signed LR statistic (sqrt of usual      statistic)
--- requiring only a Jacobian and Laplace approximation to the
integration
This result is expressed as an adjustment to the first-order
N(0,1)
 distribution of the LR: “If that approximation is poor
but not terrible, this mops up most of the error” (Rob Kass)
This is not hard to fathom---accessible to a graduate level
theory course---if one need not be distracted by arcane details
 
 
13
14
Indeed, Skovgaard (1985) confirmed that in general
is to                   sufficient, and  conditioning on
(among other choices) leads with that order to:
(a) no loss of “information”, (b) the MLE being sufficient
The LR approximation to the distribution of the MLE (usually
but less usefully called the        (or the “magic)  formula is
then
 
15
Though this took some years to emerge, in
retrospect it becomes fairly simple:
The aim then is to transform this to the distribution of
The Jacobian and marginalization to be applied to
involve rather arcane sample space derivatives
approximations to which are taken care of by the software we
provide.
The result is an adjusted LR statistic
such that
 
 
 
 
16
17
It was almost prohibitively difficult  to differentiate the
likelihood with respect to MLEs while holding fixed a (rather
notional) ancillary statistic
The approximations referred to came in a breakthrough by
Skovgaard, making the theory practical
Skovgaard’s approximation uses projections involving
covariances of likelihood quantities computed without
holding fixed an ancillary
Our software uses simulation for these covariances, NOT
involving model fitting in simulation trials
18
To use the generic software, the user specifies an 
R
function for computing the likelihood.  The package design
render the it quite generally applicable.
Since higher-order inference 
depends on more than the
likelihood function
, one defines the extra-likelihood aspects
of the model by providing another R-function that
generates a dataset.
The interest parametric function is defined by one further
R-function.
We illustrate this with a Weibull example, and interest
parameter the survival function at a given time and
covariate
19
Here there are 17 observations on {leukemia survival  time,
and one covariable log WBC} with a simple linear regression
model for the log hazard function.
Inference is on the survival probability at a given time and
covariate value.
We test the hypothesis that this probability is equal to the
1
st
 order 0.975 lower confidence limit, against alternatives of
smaller values.
Results for 1
st
- and 2
nd
- order LR tests and Wald test are
20
21
Confidence Distributions: One-sided confidence limits at all
possible levels.   
P
-vals are one-tailed error probabilities from
testing.
22
There are 4 other examples in the paper, including inference on
autocorrelation in AR-1, binomial overdispersion model, and other
settings where one would ordinary use 1
st
-order asymptotics.
The higher-order improvements are of practical interest.  In the
examples, for moderate sample sizes, 
P
-values around 0.05 are modified
by a factor of about 2 using higher-order asymptotics.
I am aware that calling this “Modern Frequentist-Likelihood Inference”
may presume that the methods here will be more widely used
Our aim with the paper and 
R 
package is to contribute to that with
exposition and software that applies widely.
Slide Note
Embed
Share

The presentation by Donald A. Pierce and Ruggero Bellio delves into Modern Likelihood-Frequentist Inference, discussing its significance as an advancement in statistical theory and methods. They highlight the shift towards likelihood and sufficiency, complementing Neyman-Pearson theory. The talk covers the historical context, introducing the audience to the concepts of Neo-Fisherian theory of inference and the development of modern likelihood asymptotics. Key references in the field are also provided.

  • Inference
  • Statistical Theory
  • Likelihood
  • Neyman-Pearson
  • Asymptotics

Uploaded on Oct 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Modern Likelihood-Frequentist Inference Donald A Pierce, Emeritus, OSU Statistics and Ruggero Bellio, Univ of Udine Slides and working paper, other things are at: http://www.science.oregonstate.edu/~piercedo Slides and paper only are at: https://www.dropbox.com/sh/fd6yqcfb2lfubyf/AAAfHspffPSfur6Qs7WJDTr9a?dl=0

  2. PURPOSE OF THIS TALK To summarize the Pierce & Bellio working paper Modern Likelihood- Frequentist Inference . Topic of that is an important advance in statistical theory and methods, due to many workers, largely occurring since 1986. Complement to Neyman-Pearson theory, based more on likelihood and sufficiency. Results considerably improve practically -- on the accuracy of usual first-order likelihood methods, such as the Wald and likelihood ratio chi-squared tests. Our paper provides an exposition of this topic intended for a wide audience of statisticians. It also introduces an R package likelihoodAsy , which I will describe here. 2

  3. Shortly before 1980, important developments in frequency theory of inference were in the air . Strictly, this was about new asymptotic methods, but with the capacity leading to what has been called Neo-Fisherian theory of inference. A complement to the Neyman-Pearson theory, emphasizing likelihood and conditioning for the reduction of data for inference, rather than direct focus on optimality, e.g. UMP tests 3

  4. How it all started, largely (there were earlier developments ) 4

  5. A few years after that, this pathbreaking paper led the way to remarkable further development of MODERN LIKELIHOOD ASYMPTOTICS That paper was difficult, so Dawn Peters and I had some success interpreting/promoting/extending it in an invited RSS discussion paper 5

  6. HIGHLY ABRIDGED REFERENCE LIST Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed likelihood ratio. Biometrika 73, 307-322. Durbin, J. (1980). Approximations for densities of sufficient estimators. Biometrika67, 311-333. Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information. Biometrika 65, 457-482. Pierce, D. A. and Peters, D. (1992). Practical use of higher-order asymptotics for multiparameter exponential families. J. Roy. Statist. Soc. B. 54, 701-737. Pierce, D.A. and Bellio, R. (in prep). Modern likelihood-frequentist inference. (Basis for this talk) Skovgaard, I. M. (1996). An explicit large-deviation approximation to one- parameter tests. Bernoulli 2, 145-165. 6

  7. Inference and Asymptotics (1994) Barndorff-Nielsen & Cox Principles of Statistical Inference from a Neo-Fisherian Perspective (1997) Pace & Salvan Likelihood Methods in Statistics (2000) Severini SOME MAJOR BOOKS R package likelihoodAsy, Available at http://cran.r-project.org/ SOFTWARE ANOUNCED IN WORKING PAPER Applies quite generally, requiring mainly only a user-provided R code for the likelihood function. Going well beyond exponential families, and even independent observations. 7

  8. Salvan (Univ Padua) and Pace & Bellio (Univ Udine) made it possible for me to visit 2-4 months/year from 2000 to 2016 to study Likelihood Asymptotics In 2012 they arranged for me a Fellowship at Padua, work under which led to the paper in progress discussed today This is based on the idea that the future of Likelihood Asymptotics will depend on: (a) development of generic computational tools and (b) concise and transparent exposition amenable to statistical theory courses. 8

  9. For a model with parameter and scalar interest parameter , write for the MLE s with and without constraint 0 ( ) = { , } ( ) The 1st-order LR test is based on a standard normal approximation to the signed root LR statistic ( ) 2{ ( ; ) r sign = ( ; )} l y l y The aim is to improve on the this through a modified LR statistic such that *r ( ) = = + * 1 { ( ) p r Y ( ): r y { (y)}{1 r ( )} O n 9

  10. To Fisher, optimality of inference involved sufficiency, more strongly than in the Neyman-Pearson theory But generally the MLE is not a sufficient statistic Thus to Fisher, and many others, the resolution of that was conditioning on an ancillary statistic to render the MLE sufficient beyond 1st order. Ancillary statistics carry information about the precision of the inference, but not the value of the parameter, e.g. the ratio of observed to expected Fisher information. 10

  11. A central concept in what follows involves observed and expected (Fisher) information. The observed information is defined as minus the second derivative of the loglikelihood at its maximum ( ; ) | j l y = = The expected information (more usual Fisher info) is defined as ( ) { ( ; )} i E l Y = i = ( ) i And we will write 11

  12. i j = The MLE is sufficient if and only if , and under regularity this occurs only for exponential families without nonlinear restriction on the parameter (full rank case) Inferentially it is unwise and not really necessary to use the average information it is more useful for planning With methods indicated here, it is feasible to condition on an ancillary statistic such as / (meaningactually a j i = i 1 ) j This is the key part of what is called Neo-Fisherian Inference 12

  13. Starting point is a simple and accurate likelihood ratio approximation to the distribution of the (multidimensional) maximum likelihood estimator Next step is to transform & marginalize from this to the distribution of the signed LR statistic (sqrt of usual statistic) --- requiring only a Jacobian and Laplace approximation to the integration 2 1 This result is expressed as an adjustment to the first-order N(0,1)distribution of the LR: If that approximation is poor but not terrible, this mops up most of the error (Rob Kass) This is not hard to fathom---accessible to a graduate level theory course---if one need not be distracted by arcane details 13

  14. ( , ) a Indeed, Skovgaard (1985) confirmed that in general is to sufficient, and conditioning on (among other choices) leads with that order to: (a) no loss of information , (b) the MLE being sufficient / j i = a (1/ ) O n P The LR approximation to the distribution of the MLE (usually but less usefully called the (or the magic) formula is then 1/2 * /2 ( | ; ) (2 ) ( | ; ) 1 pr a = + * p | ( )| j ( ; ) ( ; ) pr y pr y = pr a p 1 ( ) O n 14

  15. Though this took some years to emerge, in retrospect it becomes fairly simple: ( | ; ) ( | ; ) ( | ; ) ( | ; ) ( | ; ) a p p a a ( | ; ) a ( | ; ) a p p (2 ) ; ) ( | , ; ) ( | , ; ) ( ; ) ( ; ) ( ; ) ( p ( | ; ) p a p y p y p y p y p y a a p p a a ( | ; ) since is cond. suff to2 order a nd p = and with Edgeworth expansion to the final term p | ( )| j 1/2 1/2 this having relative error (1/ ) for all = + ( ) O n O n /2 p y = * The aim then is to transform this to the distribution of r 15

  16. * ( ) p The Jacobian and marginalization to be applied to involve rather arcane sample space derivatives 2 ( ) l | 1/2 j 1/2 = | { ( ; , ) P l = ( ; , )}/ | || , || | C j u a l a j | P T approximations to which are taken care of by the software we provide. The result is an adjusted LR statistic 1 log( ) r r C = + + = + + * 1 log / r r u r r NP INF such that { ( ) p r Y ( ) = = + * 1 ( ): r y { (y)}{1 r ( )} O n 16

  17. It was almost prohibitively difficult to differentiate the likelihood with respect to MLEs while holding fixed a (rather notional) ancillary statistic The approximations referred to came in a breakthrough by Skovgaard, making the theory practical Skovgaard s approximation uses projections involving covariances of likelihood quantities computed without holding fixed an ancillary Our software uses simulation for these covariances, NOT involving model fitting in simulation trials 17

  18. To use the generic software, the user specifies an R function for computing the likelihood. The package design render the it quite generally applicable. Since higher-order inference depends on more than the likelihood function, one defines the extra-likelihood aspects of the model by providing another R-function that generates a dataset. The interest parametric function is defined by one further R-function. We illustrate this with a Weibull example, and interest parameter the survival function at a given time and covariate 18

  19. Here there are 17 observations on {leukemia survival time, and one covariable log WBC} with a simple linear regression model for the log hazard function. Inference is on the survival probability at a given time and covariate value. We test the hypothesis that this probability is equal to the 1st order 0.975 lower confidence limit, against alternatives of smaller values. Results for 1st- and 2nd- order LR tests and Wald test are = = 1.66 ( 0.048) r P ( ) = = * 2.10 = 0.018 = r P 1.95 ( 0.025) Wald P 19

  20. 20

  21. Confidence Distributions: One-sided confidence limits at all possible levels. P-vals are one-tailed error probabilities from testing. 21

  22. There are 4 other examples in the paper, including inference on autocorrelation in AR-1, binomial overdispersion model, and other settings where one would ordinary use 1st-order asymptotics. The higher-order improvements are of practical interest. In the examples, for moderate sample sizes, P-values around 0.05 are modified by a factor of about 2 using higher-order asymptotics. I am aware that calling this Modern Frequentist-Likelihood Inference may presume that the methods here will be more widely used Our aim with the paper and R package is to contribute to that with exposition and software that applies widely. 22

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#