Bayesian Data Analysis
Dive into Bayesian data analysis with a focus on Psychology applications. Learn about Bayesian inference, model parameters, Markov-Chain Monte Carlo, alternatives to NHST, and more. Explore tools like R, JAGS, Stan, and JASP through practical examples and tutorials. Enhance your skills in conducting and reporting Bayesian data analyses within two days.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Bayesian Data Analysis Darrell A. Worthy Texas A&M University
Overview The goals of this short course are (1) to get you familiarized with the logic and theory of Bayesian inference and (2) to get you trained on how to conduct and report the results of Bayesian Data analyses. These two goals will be pursued on Days 1 and 2 of the course, respectively. Bayesian data analysis spans the sciences Many influential scholars have been pure mathematicians or statisticians, and some have worked in areas such as physics. This is important to know, but I will focus on Bayesian data applications to Psychology and I will draw mostly from Psychologists.
Overview On Day 1 I will present material mostly from John K. Kruschke s book: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan I think the book does a good job of explaining the ideas behind Bayesian inference. I recommend this book for anyone who would like to learn R as well as other programs that interface with R like Stan and JAGS. There are entire chapters devoted to learning R, then JAGS, and then Stan, and then chapters devoted to specific analyses from the General Linear Model.
Overview Day 2 will primarily focus on doing Bayesian data analysis with JASP. JASP is a free program that you can download at jasp-stats.org It was recently developed by E.J. Wagenmakers and colleagues at the University of Amsterdam along with Jeff Rouder and Richard Morey. JASP is a direct competitor to companies with expensive proprietary software like IBM (SPSS). As we will see it s very easy to conduct Bayesian (and traditional) analyses using JASP, so after tomorrow you should be ready to report results of Bayesian analyses in your published work. We will also conduct some analyses using R and JAGS, which provide additional info not currently given by JASP.
Overview Day 1 Bayesian inference and model parameters (Chapter 2, Kruschke) Bayes Rule (Chapter 5, Kruschke) Markov-Chain Monte Carlo (Chapter 7, Kruschke) Null-hypothesis significant testing (NHST; Chapter 11, Kruschke) Bayesian alternatives to NHST (Chapter 12, Kruschke) Day 2 JASP Introduction Examples from Wagenmakers et al., (2017) on correlation, t-test, one-way ANOVA, and two-way ANOVA Then published examples from my lab on two-way ANOVA and regression. Regression examples using R and JAGS Then analyze some of your own data if there is time.
Acknowledgements Day 1 lectures are heavily drawn from the Kruschke text Most of the words are his or paraphrases of his text, rather than my own. Day 2 draws heavily from a tutorial for JASP written by Wagenmakers and colleagues available on the OSF: https://osf.io/m6bi8/ We will also work with a couple of my published data sets and hopefully with some of your own data. My work has been to read the entire book as well as additional papers and books and condense all of the material into a format that will hopefully have you doing Bayesian data analysis in two days.
Introduction: Credibility, Models, and Parameters Chapter 2 of Kruschke text Darrell A. Worthy Texas A&M University
Foundational Ideas Bayesian data analysis has two foundational ideas. First, Bayesian inference is reallocation of credibility across possibilities. Given the data how do we revise our prior beliefs? Second, the possibilities, over which we allocate credibility, are parameter values in meaningful mathematical models. How much credibility do we give to r = .35? To t = -2.54? To r = 0 (a null value)
Reallocation of credibility Suppose we step outside one morning and notice that the sidewalk is wet. We might consider several possible causes: Rain Garden irrigation An erupted spring A busted sewage pipe A passerby spilled a drink All these possibilities have some prior credibility that can differ Recent rain might be more likely than a busted sewage pipe
Reallocation of credibility Continuing on the sidewalk we collect new data If we observe that the sidewalk is wet as far as we can see, and that everything else is also wet then the hypothesis that it rained recently gains credibility. On the other hand if we notice that the ground is wet in only one spot, and that there are a couple of empty beer cans nearby then the spilled drink hypothesis gains credibility. This reallocation of credibility across possibilities is the essence of Bayesian inference.
Reallocation of credibility Another example of Bayesian inference is given by Sherlock Holmes: How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth? Holmes conceived of a set of a set of possible causes, some of which may have seemed improbable a priori. He then systematically gathered evidence that ruled out a number of the possible causes. If all possible causes were eliminated then (Bayesian) reasoning forced him to conclude that the remaining cause was fully credible.
Reallocation of credibility Holmes reasoning is illustrated in this figure. Credibility, on the y-axis, is synonymous with the probability that hypotheses A-D are true. Note that probabilities sum to one, so elimination of one possibility increases the credibility given to remaining hypotheses. After credibilities are updated the posterior becomes the prior when the next piece of data comes in.
Reallocation of credibility A complementary Bayesian form of reasoning is judicial exoneration. Suppose we have four suspects who are unaffiliated. If evidence accrues that one suspect is definitely culpable, then the other suspects are exonerated. As in Holmesian deduction, this exoneration is intuitive. It s also what the exact mathematics of Bayesian inference prescribe.
Noisy data and probabilistic inferences These examples assumed that observed data had definitive, deterministic relations to the possible causes. In reality data have only probabilistic relations to their underlying causes. Holmes might find a footprint, but measurements of it would only probabilistically narrow down the range of possible shoes that produced it. In scientific research, measurements are replete with randomness and all data have some degree of noise. We have difficulty completely ruling out some possible causes. We can only incrementally adjust the credibility of some possible trends.
Noisy data and probabilistic inferences The beauty of Bayesian analysis is that the mathematics reveal exactly how much to reallocate credibility in realistic probabilistic situations. An example: Suppose a bouncy ball manufacturer makes balls sized 1, 2, 3, and 4 that represent their average diameter. However, the manufacturing process is quite variable such that a size 3 ball could have an actual diameter of 1.8 or 4.3 even though the mean is 3 (similar to group means as well as variation). Suppose we order three size 2 balls that measure 1.77, 2.23, & 2.77 Can we conclude that the factory correctly sent us size 2 balls?
Noisy data and probabilistic inferences The figure shows a Bayesian answer to this question. The top panel shows an equal prior probability of .25 for each size. The bell shaped curves represent probability distributions that reflect variation in actual size. The bottom panel shows the data points in red circles. The heights of the blue bars show the reallocation of credibility across each size. There is a .56 probability that the balls are size 2, .31 that they are 3, .11 for 1, and .02 for size 1.
Noisy data and probabilistic inferences As another example, consider testing for illicit drug use. We test one person, but crucially, the test has a non-trivial probability of producing false positives and false negatives. Consider also our prior knowledge that most people do not use the drug. Because the prior probability of drug use is small and the data are noisy, the posterior probability of drug use is surprisingly small even when the test is positive. This is true for other tests such as cancer screenings as well.
Reallocation of credibility In summary, the essence of Bayesian inference is the reallocation of credibility across possibilities. The distribution of credibility initially reflects prior knowledge about the possibilities, which can be vague, or not. Then new data are observed, and credibility is reallocated. Possibilities that are consistent with the data garner more credibility. Possibilities that are inconsistent with the data lose credibility. Bayesian analysis is the mathematics of reallocating credibility in a logically coherent and precise way.
Possibilities are parameter values in descriptive models Consider a study where one group takes a blood pressure drug and the other takes a placebo. How big is the difference between the typical blood pressure in one group versus the other, and how certain can we be of the difference? The magnitude of difference describes the data, and our goal is to assess which possible descriptions of the data are more credible. Data analysis begins with a family of candidate descriptions of the data. These descriptions are mathematical formulas that characterize the trends and spreads in the data.
Possibilities are parameter values in descriptive models These mathematical formulas have numbers, called parameter values, that determine the exact shape of mathematical forms. Parameters can be thought of as control knobs that simulate data generation. You have hopefully encountered the normal or Gaussian distribution, a bell-shaped distribution often used to describe data. The mean is one parameter or control knob used to determine the location of the distribution s central tendency (a location parameter). The standard deviation is another parameter that determines the width or dispersion of the distribution (a scale parameter).
Possibilities are parameter values in descriptive models This figure shows some candidate normal distributions superimposed (in blue) on a histogram of actual data. The upper panel shows a distribution with a mean of 10 and SD of 5; these parameters seem to describe the data fairly well. The lower panel shows a different distribution does not describe the data as well. The role of Bayesian inference is to compute exact relative credibilities of candidate parameter values, while also accounting for their prior probabilities.
Possibilities are parameter values in descriptive models There are two main desiderata for a mathematical description of the data. First, the mathematical form should be comprehensible with meaningful parameters. In the case of the normal distribution we know how to interpret the mean and SD parameters. Second, the mathematical description should be descriptively adequate, or look like the data. The overlaid normal distributions on the previous slide should not look too different from the histograms of actual data.
Steps of Bayesian data analysis In general, Bayesian analysis of data follows these steps: 1. Identify the data relevant to the research questions. 2. Define a descriptive model for the relevant data. 3. Specify a prior distribution of the parameters. 4. Use Bayesian inference to reallocate credibility across parameter values. 5. Check that the posterior predictions mimic the data with reasonable accuracy with a posterior predictive check .
Steps of Bayesian data analysis An example: Suppose we are interested in the relationship between the weight and height of people. We suspect that taller people tend to weigh more, but we would like to know how much people s weights tend to increase as their height increases. And how certain we can be about the magnitude of the increase. First we identify the relevant data. We have data from 57 mature adults sampled at random from the population. We would like to predict weight from height.
Steps of Bayesian data analysis The second step is to define a descriptive model of the data that is meaningful for our research interest. A good candidate model is a linear additive model, in this case a one- predictor regression model. y^ = 0 + 1x The coefficient 1 indicates how much weight (y^) is predicted to change as a function of height (x). We also need to describe the random variation of actual weights around the predicted weights. For simplicity we will use a normal distribution: y ~ normal(y^, )
Steps of Bayesian data analysis In these first two steps we have done what we would have done in a classical frequentist analysis up to this point. The third step is to specify a prior distribution on the parameters. For this purpose and most of our analyses we will use a vague and noncommittal prior that places virtually equal prior credibility across a large range of candidate parameter values. In some contexts like cancer screenings we can have more informative priors if we know base rates from the population, but here are purpose is to use vague priors to avoid them biasing our conclusions.
Steps of Bayesian data analysis The fourth step is conducting Bayesian inference and interpreting the posterior distribution. Bayesian inference has reallocated credibility across parameter values, from the vague priors we started with, to values that are consistent with the data. The posterior distribution indicates combinations of 0, 1,and that together are credible, given the data.
Steps of Bayesian data analysis The left panel shows the data with many credible regression lines. The right panel shows the posterior distribution of the slope parameter.
Steps of Bayesian data analysis This is a distribution of parameter values, not a distribution of data. The blue bars indicate the credibility across the continuum of candidate slope values. The mode of the distribution is typically used as the most probable value. Weight increases about 4.11 pounds for every inch height is increased.
Steps of Bayesian data analysis One way to summarize the uncertainty in the estimated slope is by marking the span of values that are most credible and cover 95% of the distribution. This is called the highest density interval (HDI), or credible interval (CI). Values within the 95% HDI are more credible (have higher probability density) With more data the estimate would be more precise, the 95% HDI more narrow.
Steps of Bayesian data analysis The figure also shows where a slope of zero falls relative to the 95% HDI. Zero is far outside any credible value for the slope. We could reject a slope of zero as a plausible description of the relation between height and weight. This discrete decision about the status of 0 is ok, but is separate from the Bayesian analysis per se, which just provides the complete posterior distribution.
NHST versus Bayes Most of you have probably heard of NHST which involves sampling distributions of summary statistics t, from which are computed p values. The posterior distribution on the previous slides is not a sampling distribution and has nothing to do with p values. It s also important to stress that 95% HDIs are not the same as 95% confidence intervals HDIs are interpretable in terms of post-data probabilities Frequentist CIs just tell you that you can be 95% confident (theoretically) that the true parameter value lies somewhere in that interval.
Steps of Bayesian data analysis The fifth step is to check that the model, with its most credible parameter values, actually mimics the data reasonably well. This is called a posterior predictive check. While there is no single best approach, one is to plot a summary of predicted data from the model against the actual data. We would take credible values of the slope and standard deviation parameters and plug them into the model equations and randomly simulate y values (weights) at selected x values (weights). Do that numerous times to create representative distributions of what data would look like according to the model.
Steps of Bayesian data analysis In this figure the predicted weight values are summarized by the vertical bars that show the range of the 95% most credible predicted weight values. The predictions seem to fit the data well. If the actual data did appear to deviate systematically from the predicted form then we could contemplate alternative descriptive models. Like adding a quadratic term
Steps of Bayesian data analysis Now we ve seen an overview of the logic of Bayesian data analysis. We ve also made it clear that what we are doing is estimating posterior probability distributions that represent the evidence for candidate parameter values. We will not cover posterior predictive checks much in this course. This is to save time and because many of the basic analyses we routinely run on our data do not produce overly complex models. We ll now move on to exploring Bayes rule, and the math behind it, in more detail.