Modern Likelihood-Frequentist Inference: A Brief Overview

Modern Likelihood-Frequentist

Inference

Donald A Pierce, Emeritus, OSU Statistics

and

Ruggero Bellio, Univ of Udine

Slides and working paper, other things are at:

Slides and paper only are at:

https://www.dropbox.com/sh/fd6yqcfb2lfubyf/AAAfHspffPSfur6Qs7WJDTr9a?dl=0 http://www.science.oregonstate.edu/~piercedo

PURPOSE OF THIS TALK

To summarize the Pierce & Bellio working paper “Modern Likelihood-

Frequentist Inference”.  Topic of that is an important advance in

statistical theory and methods, due to many workers, largely occurring

since 1986. Complement to Neyman-Pearson theory, based more on

likelihood and sufficiency.  Results considerably improve – practically --

on the accuracy of usual first-order likelihood methods, such as the Wald

and likelihood ratio chi-squared tests.

Our paper provides an exposition of this topic intended for a wide

audience of statisticians.

It also introduces an

 package

likelihoodAsy

, which I will describe

here.

•

Shortly before 1980, important developments in

frequency theory of inference were  “in the air”.

•

Strictly, this was about new asymptotic methods, but

with the capacity leading to what has been called

“Neo-Fisherian” theory of inference.

•

A complement to the Neyman-Pearson theory,

emphasizing likelihood and conditioning for the

reduction of data for inference, rather than direct

focus on optimality, e.g. UMP tests

How it all started, largely (there were earlier developments )

A few years after that, this pathbreaking paper led the way to

remarkable further development of

MODERN LIKELIHOOD

ASYMPTOTICS

That paper was difficult, so Dawn Peters and I had some success

interpreting/promoting/extending it in an invited RSS discussion

paper

HIGHLY ABRIDGED REFERENCE LIST

Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters

based on the standardized signed likelihood ratio.

Biometrika

307-322.

Durbin, J. (1980). Approximations for densities of sufficient estimators.

Biometrika

, 311-333.

Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum

likelihood estimator: Observed versus expected Fisher information.

Biometrika

, 457-482.

Pierce, D. A. and Peters, D. (1992). Practical use of higher-order

asymptotics for multiparameter exponential families.

J. Roy.

Statist. Soc. B.

701-737.

Pierce, D.A. and Bellio, R. (in prep). Modern likelihood-frequentist

inference.

(Basis for this talk)

Skovgaard, I. M. (1996). An explicit large-deviation approximation to one-

parameter tests.

Bernoulli

, 145-165.

SOME MAJOR BOOKS

Inference and Asymptotics

(1994) Barndorff-Nielsen & Cox

Principles of Statistical Inference from a Neo-Fisherian Perspective

(1997)

Pace & Salvan

Likelihood Methods in Statistics

 (2000) Severini

SOFTWARE ANOUNCED IN WORKING PAPER

R package  likelihoodAsy

 ,    Available at

http://cran.r-project.org/

Applies quite generally, requiring mainly only a user-provided R code for

the likelihood function.

Going well beyond exponential families, and even independent

observations.

•

Salvan (Univ Padua) and Pace & Bellio (Univ Udine) made

it possible for me to visit 2-4 months/year from 2000 to

2016 to study Likelihood Asymptotics

•

In 2012 they arranged for me a Fellowship at Padua, work

under which led to the paper in progress discussed today

•

This is based on the idea that the future of Likelihood

Asymptotics will depend on: (a) development of generic

computational tools and (b) concise and transparent

exposition amenable to statistical theory courses.

•

For a model with parameter      and

scalar

 interest parameter

          , write               for the MLE’s with and without constraint

•

The 1

st

-order LR test is based on a standard normal

approximation to the signed root LR statistic

•

The aim is to improve on the this through a modified LR

statistic       such that

•

To Fisher, “optimality” of inference involved sufficiency,

more  strongly than in the Neyman-Pearson theory

•

But generally the MLE is not a sufficient statistic

•

Thus to Fisher, and many others, the resolution of that

was conditioning on an

ancillary statistic

to render the

MLE sufficient beyond 1

st

 order.

•

Ancillary statistics carry information about the

precision

of the inference, but not the

value

 of the parameter, e.g.

the ratio of

observed

to

expected

 Fisher information.

•

A central concept in what follows involves

observed

and

expected

 (Fisher) information.

•

The

observed

 information is defined as minus the

second derivative of the loglikelihood at its maximum

•

The

expected

 information (more usual Fisher info) is

defined as

•

And we will write

•

The MLE is sufficient if and only if             , and under

regularity this occurs only for exponential families without

nonlinear restriction on the parameter (full rank case)

•

Inferentially it is unwise and not really necessary to use the

average information – it is more useful for planning

•

With methods indicated here, it is feasible to condition on

an

ancillary statistic

such as

•

This is the key part of what is called

Neo-Fisherian Inference

•

Starting point is a simple and accurate ‘likelihood ratio

approximation’ to the distribution of the (multidimensional)

maximum likelihood estimator

•

Next step is to transform & marginalize from this to the

distribution of the signed LR statistic (sqrt of usual      statistic)

--- requiring only a Jacobian and Laplace approximation to the

integration

•

This result is expressed as an adjustment to the first-order

N(0,1)

 distribution of the LR: “If that approximation is poor

but not terrible, this mops up most of the error” (Rob Kass)

•

This is not hard to fathom---accessible to a graduate level

theory course---if one need not be distracted by arcane details

•

Indeed, Skovgaard (1985) confirmed that in general

is to                   sufficient, and  conditioning on

(among other choices) leads with that order to:

(a) no loss of “information”, (b) the MLE being sufficient

•

The LR approximation to the distribution of the MLE (usually

but less usefully called the        (or the “magic)  formula is

then

•

Though this took some years to emerge, in

retrospect it becomes fairly simple:

•

The aim then is to transform this to the distribution of

•

The Jacobian and marginalization to be applied to

involve rather arcane sample space derivatives

approximations to which are taken care of by the software we

provide.

•

The result is an adjusted LR statistic

such that

•

It was almost prohibitively difficult  to differentiate the

likelihood with respect to MLEs while holding fixed a (rather

notional) ancillary statistic

•

The approximations referred to came in a breakthrough by

Skovgaard, making the theory practical

•

Skovgaard’s approximation uses projections involving

covariances of likelihood quantities computed without

holding fixed an ancillary

•

Our software uses simulation for these covariances, NOT

involving model fitting in simulation trials

•

To use the generic software, the user specifies an

function for computing the likelihood.  The package design

render the it quite generally applicable.

•

Since higher-order inference

depends on more than the

likelihood function

, one defines the extra-likelihood aspects

of the model by providing another R-function that

generates a dataset.

•

The interest parametric function is defined by one further

R-function.

•

We illustrate this with a Weibull example, and interest

parameter the survival function at a given time and

covariate

•

Here there are 17 observations on {leukemia survival  time,

and one covariable log WBC} with a simple linear regression

model for the log hazard function.

•

Inference is on the survival probability at a given time and

covariate value.

•

We test the hypothesis that this probability is equal to the

st

 order 0.975 lower confidence limit, against alternatives of

smaller values.

•

Results for 1

st

- and 2

nd

- order LR tests and Wald test are

Confidence Distributions: One-sided confidence limits at all

possible levels.

-vals are one-tailed error probabilities from

testing.

•

There are 4 other examples in the paper, including inference on

autocorrelation in AR-1, binomial overdispersion model, and other

settings where one would ordinary use 1

st

-order asymptotics.

•

The higher-order improvements are of practical interest.  In the

examples, for moderate sample sizes,

-values around 0.05 are modified

by a factor of about 2 using higher-order asymptotics.

•

I am aware that calling this “Modern Frequentist-Likelihood Inference”

may presume that the methods here will be more widely used

•

Our aim with the paper and

package is to contribute to that with

exposition and software that applies widely.

Slide Note

Embed Share

Download

The presentation by Donald A. Pierce and Ruggero Bellio delves into Modern Likelihood-Frequentist Inference, discussing its significance as an advancement in statistical theory and methods. They highlight the shift towards likelihood and sufficiency, complementing Neyman-Pearson theory. The talk covers the historical context, introducing the audience to the concepts of Neo-Fisherian theory of inference and the development of modern likelihood asymptotics. Key references in the field are also provided.

hule_ri Follow

Uploaded on Oct 07, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Modern Likelihood-Frequentist Inference Donald A Pierce, Emeritus, OSU Statistics and Ruggero Bellio, Univ of Udine Slides and working paper, other things are at: http://www.science.oregonstate.edu/~piercedo Slides and paper only are at: https://www.dropbox.com/sh/fd6yqcfb2lfubyf/AAAfHspffPSfur6Qs7WJDTr9a?dl=0

PURPOSE OF THIS TALK To summarize the Pierce & Bellio working paper Modern Likelihood- Frequentist Inference . Topic of that is an important advance in statistical theory and methods, due to many workers, largely occurring since 1986. Complement to Neyman-Pearson theory, based more on likelihood and sufficiency. Results considerably improve practically -- on the accuracy of usual first-order likelihood methods, such as the Wald and likelihood ratio chi-squared tests. Our paper provides an exposition of this topic intended for a wide audience of statisticians. It also introduces an R package likelihoodAsy , which I will describe here. 2

Shortly before 1980, important developments in frequency theory of inference were in the air . Strictly, this was about new asymptotic methods, but with the capacity leading to what has been called Neo-Fisherian theory of inference. A complement to the Neyman-Pearson theory, emphasizing likelihood and conditioning for the reduction of data for inference, rather than direct focus on optimality, e.g. UMP tests 3

How it all started, largely (there were earlier developments ) 4

A few years after that, this pathbreaking paper led the way to remarkable further development of MODERN LIKELIHOOD ASYMPTOTICS That paper was difficult, so Dawn Peters and I had some success interpreting/promoting/extending it in an invited RSS discussion paper 5

HIGHLY ABRIDGED REFERENCE LIST Barndorff-Nielsen, O. E. (1986). Inference on full or partial parameters based on the standardized signed likelihood ratio. Biometrika 73, 307-322. Durbin, J. (1980). Approximations for densities of sufficient estimators. Biometrika67, 311-333. Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information. Biometrika 65, 457-482. Pierce, D. A. and Peters, D. (1992). Practical use of higher-order asymptotics for multiparameter exponential families. J. Roy. Statist. Soc. B. 54, 701-737. Pierce, D.A. and Bellio, R. (in prep). Modern likelihood-frequentist inference. (Basis for this talk) Skovgaard, I. M. (1996). An explicit large-deviation approximation to one- parameter tests. Bernoulli 2, 145-165. 6

Inference and Asymptotics (1994) Barndorff-Nielsen & Cox Principles of Statistical Inference from a Neo-Fisherian Perspective (1997) Pace & Salvan Likelihood Methods in Statistics (2000) Severini SOME MAJOR BOOKS R package likelihoodAsy, Available at http://cran.r-project.org/ SOFTWARE ANOUNCED IN WORKING PAPER Applies quite generally, requiring mainly only a user-provided R code for the likelihood function. Going well beyond exponential families, and even independent observations. 7

Salvan (Univ Padua) and Pace & Bellio (Univ Udine) made it possible for me to visit 2-4 months/year from 2000 to 2016 to study Likelihood Asymptotics In 2012 they arranged for me a Fellowship at Padua, work under which led to the paper in progress discussed today This is based on the idea that the future of Likelihood Asymptotics will depend on: (a) development of generic computational tools and (b) concise and transparent exposition amenable to statistical theory courses. 8

For a model with parameter and scalar interest parameter , write for the MLE s with and without constraint 0 ( ) = { , } ( ) The 1st-order LR test is based on a standard normal approximation to the signed root LR statistic ( ) 2{ ( ; ) r sign = ( ; )} l y l y The aim is to improve on the this through a modified LR statistic such that *r ( ) = = + * 1 { ( ) p r Y ( ): r y { (y)}{1 r ( )} O n 9

To Fisher, optimality of inference involved sufficiency, more strongly than in the Neyman-Pearson theory But generally the MLE is not a sufficient statistic Thus to Fisher, and many others, the resolution of that was conditioning on an ancillary statistic to render the MLE sufficient beyond 1st order. Ancillary statistics carry information about the precision of the inference, but not the value of the parameter, e.g. the ratio of observed to expected Fisher information. 10

A central concept in what follows involves observed and expected (Fisher) information. The observed information is defined as minus the second derivative of the loglikelihood at its maximum ( ; ) | j l y = = The expected information (more usual Fisher info) is defined as ( ) { ( ; )} i E l Y = i = ( ) i And we will write 11

i j = The MLE is sufficient if and only if , and under regularity this occurs only for exponential families without nonlinear restriction on the parameter (full rank case) Inferentially it is unwise and not really necessary to use the average information it is more useful for planning With methods indicated here, it is feasible to condition on an ancillary statistic such as / (meaningactually a j i = i 1 ) j This is the key part of what is called Neo-Fisherian Inference 12

Starting point is a simple and accurate likelihood ratio approximation to the distribution of the (multidimensional) maximum likelihood estimator Next step is to transform & marginalize from this to the distribution of the signed LR statistic (sqrt of usual statistic) --- requiring only a Jacobian and Laplace approximation to the integration 2 1 This result is expressed as an adjustment to the first-order N(0,1)distribution of the LR: If that approximation is poor but not terrible, this mops up most of the error (Rob Kass) This is not hard to fathom---accessible to a graduate level theory course---if one need not be distracted by arcane details 13

( , ) a Indeed, Skovgaard (1985) confirmed that in general is to sufficient, and conditioning on (among other choices) leads with that order to: (a) no loss of information , (b) the MLE being sufficient / j i = a (1/ ) O n P The LR approximation to the distribution of the MLE (usually but less usefully called the (or the magic) formula is then 1/2 * /2 ( | ; ) (2 ) ( | ; ) 1 pr a = + * p | ( )| j ( ; ) ( ; ) pr y pr y = pr a p 1 ( ) O n 14

Though this took some years to emerge, in retrospect it becomes fairly simple: ( | ; ) ( | ; ) ( | ; ) ( | ; ) ( | ; ) a p p a a ( | ; ) a ( | ; ) a p p (2 ) ; ) ( | , ; ) ( | , ; ) ( ; ) ( ; ) ( ; ) ( p ( | ; ) p a p y p y p y p y p y a a p p a a ( | ; ) since is cond. suff to2 order a nd p = and with Edgeworth expansion to the final term p | ( )| j 1/2 1/2 this having relative error (1/ ) for all = + ( ) O n O n /2 p y = * The aim then is to transform this to the distribution of r 15

* ( ) p The Jacobian and marginalization to be applied to involve rather arcane sample space derivatives 2 ( ) l | 1/2 j 1/2 = | { ( ; , ) P l = ( ; , )}/ | || , || | C j u a l a j | P T approximations to which are taken care of by the software we provide. The result is an adjusted LR statistic 1 log( ) r r C = + + = + + * 1 log / r r u r r NP INF such that { ( ) p r Y ( ) = = + * 1 ( ): r y { (y)}{1 r ( )} O n 16

It was almost prohibitively difficult to differentiate the likelihood with respect to MLEs while holding fixed a (rather notional) ancillary statistic The approximations referred to came in a breakthrough by Skovgaard, making the theory practical Skovgaard s approximation uses projections involving covariances of likelihood quantities computed without holding fixed an ancillary Our software uses simulation for these covariances, NOT involving model fitting in simulation trials 17

To use the generic software, the user specifies an R function for computing the likelihood. The package design render the it quite generally applicable. Since higher-order inference depends on more than the likelihood function, one defines the extra-likelihood aspects of the model by providing another R-function that generates a dataset. The interest parametric function is defined by one further R-function. We illustrate this with a Weibull example, and interest parameter the survival function at a given time and covariate 18

Here there are 17 observations on {leukemia survival time, and one covariable log WBC} with a simple linear regression model for the log hazard function. Inference is on the survival probability at a given time and covariate value. We test the hypothesis that this probability is equal to the 1st order 0.975 lower confidence limit, against alternatives of smaller values. Results for 1st- and 2nd- order LR tests and Wald test are = = 1.66 ( 0.048) r P ( ) = = * 2.10 = 0.018 = r P 1.95 ( 0.025) Wald P 19

Confidence Distributions: One-sided confidence limits at all possible levels. P-vals are one-tailed error probabilities from testing. 21

There are 4 other examples in the paper, including inference on autocorrelation in AR-1, binomial overdispersion model, and other settings where one would ordinary use 1st-order asymptotics. The higher-order improvements are of practical interest. In the examples, for moderate sample sizes, P-values around 0.05 are modified by a factor of about 2 using higher-order asymptotics. I am aware that calling this Modern Frequentist-Likelihood Inference may presume that the methods here will be more widely used Our aim with the paper and R package is to contribute to that with exposition and software that applies widely. 22

Modern Likelihood-Frequentist Inference: A Brief Overview

Download Presentation

Presentation Transcript

Related

More Related Content