Non-Standard Errors in Research Methods

 
 
Non-Standard Errors
Albert J. Menkveld, Anna Dreber, Felix Holzmeister,
Jürgen Huber, Magnus Johannesson, Michael Kirchler,
Sebastian Neusüß, Michael Razen, Utz Weitzel
 
Discussion
Amit Goyal
 
 
 
The paper
 
Research teams (RT) test the same hypotheses on the same data. Have
the chance to modify results based on feedback from peer evaluators
(PE).
 
A good experiment as it allows one to hold constant many things that
are otherwise hard to do in meta-studies.
Same data.
Same sample period.
Same hypotheses.
 
2
 
 
Results
 
Large variation (across RTs) in results, both for size of effect as well as
for 
t
-statistic of the effect.
Dubbed as non-standard error (NSE).
 
RT quality does not explain the variability.
 
Peer feedback reduces NSE.
 
3
 
 
Standard statistics
 
Hypothesis is clearly defined. As is the test statistic.
There exists the problem of 
multiple hypothesis testing
 [Harvey, Liu, and
Zhu (2016)]. Also, file drawer problem of non-published results.
However, standard statistical tools address this problem.
 
Even if there were only a single hypothesis being tested, results may be
suspect because researchers deliberately calculate the test statistic in a
biased way (
p
-hacking
).
No clean statistical solution for this problem [but see Harvey (2017)].
Awareness of 
p
-hacking helps in putting the results in focus and may one
day help in changing incentives that lead to the problem in the first
place.
 
This all is reasonably well understood.
 
4
 
 
What else?
 
What if
The hypothesis itself was not well-articulated.
And/or the test statistic was not clearly specified
Researcher lacks sufficient ability.
Researcher makes (unintentional) mistakes.
 
Would one not see variation in results?
 
If so, what exactly is the point of showing that there is variation in
results?
 
5
 
 
Vagueness in hypothesis
 
Are markets efficient? Many different formulations:
Prices react to only news.
Portfolio strategies do not have alphas.
Before/After t-costs.
Fund managers do not make abnormal profits.
Before/After t-costs.
New information is impounded in prices quickly.
Etc.
 
Reasonable people can disagree on the metric. This will lead to
variation in size of the effect. Is this NSE?
 
6
 
 
Vagueness in test statistic
 
7
 
 
Vagueness evidence in the paper
 
RT-H3: Calculate mean and check whether it is different from zero.
 
 
 
 
 
 
 
 
 
 
 
And what now?
 
8
 
 
Variation in ability (1)
 
Two examples
1.
Impact factor
Journal of Finance: 7.54
Journal of Banking and Finance: 3.07
2.
Nobel laureates in economics
University of Chicago: ~30
University of Lausanne: 0
 
One would, therefore, have more faith in results from high-skilled
teams and in more reputable journals.
 
9
 
 
Variation in ability (2)
 
It is surprising that authors do not see this explain their results?
Authors claim that NSE is large for 9 high-quality RT. But, is it not lower
than the NSE for all 164 RT?
 
And, even if there is variation in results for high-quality teams, it is
likely due to vagueness mentioned earlier.
Both Gene Fama and Dick Thaler are at the University of Chicago but
disagree about whether the markets are efficient.
 
10
 
 
Unintentional mistakes
 
Reviewing process makes papers better.
 
Mistakes get corrected.
 
11
 
 
Overall
 
I do not like the phrase NSE. It may be cute but what the authors
calculate are not standard errors.
 
I am not sure what is the takeaway from the paper.
 
That there is a variation across results from researchers? And that one
should account for this variation? If yes, sure but “we don’t need
another hero.” 
(with apologies to Tina Turner)
 
One way to think about the paper is that there are many hypotheses
being tested on the same data. One should be aware of that. That is
correct. However, modified 
standard
 errors exist for this multiplicity of
hypothesis testing. We do not need non-standard errors.
 
12
Slide Note
Embed
Share

Research teams conducting experiments using the same data show large variations in results, termed as non-standard errors (NSE). Peer feedback helps reduce NSE, but factors like vague hypotheses and test statistic definitions can lead to result discrepancies. Addressing issues like multiple hypothesis testing and p-hacking is crucial for reliable research outcomes.

  • Research methods
  • Non-standard errors
  • Hypothesis testing
  • Peer feedback
  • Data analysis

Uploaded on Jul 20, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Non-Standard Errors Albert J. Menkveld, Anna Dreber, Felix Holzmeister, J rgen Huber, Magnus Johannesson, Michael Kirchler, Sebastian Neus , Michael Razen, Utz Weitzel Discussion Amit Goyal

  2. The paper Research teams (RT) test the same hypotheses on the same data. Have the chance to modify results based on feedback from peer evaluators (PE). A good experiment as it allows one to hold constant many things that are otherwise hard to do in meta-studies. Same data. Same sample period. Same hypotheses. 2

  3. Results Large variation (across RTs) in results, both for size of effect as well as for t-statistic of the effect. Dubbed as non-standard error (NSE). RT quality does not explain the variability. Peer feedback reduces NSE. 3

  4. Standard statistics Hypothesis is clearly defined. As is the test statistic. There exists the problem of multiple hypothesis testing [Harvey, Liu, and Zhu (2016)]. Also, file drawer problem of non-published results. However, standard statistical tools address this problem. Even if there were only a single hypothesis being tested, results may be suspect because researchers deliberately calculate the test statistic in a biased way (p-hacking). No clean statistical solution for this problem [but see Harvey (2017)]. Awareness of p-hacking helps in putting the results in focus and may one day help in changing incentives that lead to the problem in the first place. This all is reasonably well understood. 4

  5. What else? What if The hypothesis itself was not well-articulated. And/or the test statistic was not clearly specified Researcher lacks sufficient ability. Researcher makes (unintentional) mistakes. Would one not see variation in results? If so, what exactly is the point of showing that there is variation in results? 5

  6. Vagueness in hypothesis Are markets efficient? Many different formulations: Prices react to only news. Portfolio strategies do not have alphas. Before/After t-costs. Fund managers do not make abnormal profits. Before/After t-costs. New information is impounded in prices quickly. Etc. Reasonable people can disagree on the metric. This will lead to variation in size of the effect. Is this NSE? 6

  7. Vagueness in test statistic Given a sample of observations, calculating sample mean and testing whether it is different from zero is a trivial task that does not suffer from vagueness in articulating a null hypothesis (?0:? = 0). But, unless the test statistic is clearly specified, different reasonable researchers will conclude different things. OLS, White, Newey-West (how many lags) standard errors. Is the variation in t-statistic NSE? 7

  8. Vagueness evidence in the paper RT-H3: Calculate mean and check whether it is different from zero. And what now? 8

  9. Variation in ability (1) Two examples 1. Impact factor Journal of Finance: 7.54 Journal of Banking and Finance: 3.07 2. Nobel laureates in economics University of Chicago: ~30 University of Lausanne: 0 One would, therefore, have more faith in results from high-skilled teams and in more reputable journals. 9

  10. Variation in ability (2) It is surprising that authors do not see this explain their results? Authors claim that NSE is large for 9 high-quality RT. But, is it not lower than the NSE for all 164 RT? And, even if there is variation in results for high-quality teams, it is likely due to vagueness mentioned earlier. Both Gene Fama and Dick Thaler are at the University of Chicago but disagree about whether the markets are efficient. 10

  11. Unintentional mistakes Reviewing process makes papers better. Mistakes get corrected. 11

  12. Overall I do not like the phrase NSE. It may be cute but what the authors calculate are not standard errors. I am not sure what is the takeaway from the paper. That there is a variation across results from researchers? And that one should account for this variation? If yes, sure but we don t need another hero. (with apologies to Tina Turner) One way to think about the paper is that there are many hypotheses being tested on the same data. One should be aware of that. That is correct. However, modified standard errors exist for this multiplicity of hypothesis testing. We do not need non-standard errors. 12

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#