Multivariate Analysis
Explore the key concepts of marginal, conditional, and joint probability in multivariate analysis, as well as the notion of independence and Bayes' Theorem. Learn how these probabilities relate to each other and the importance of handling differences in joint and marginal probabilities.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Multivariate Analysis Part I: Marginal, Conditional, and Joint Probability Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center
Notation: Probability Marginal P(A) Joint P(A,B) or P(A and B) or P(A B) Conditional P(A|B)
Marginal Probability What is the probability of A occurring irrespective of what happens with B? A A B B
Joint Probability What is the probability of A and B occurring together? A A B B
Joint Probability ? ?,? ??? ? ? ,? ? A B Beware the Conjunction Fallacy
Independence A and B are independent iff: P(A,B) = P(A) * P(B) Then we write A B Not easy to show on a Venn Diagram
Conditional Probability Given that B has occurred, what is the probability that A occurs? Once we have observed that B has occurred, it becomes our universe A Ac c A A A AB B B B
Conditional Probability ? ? ? ? ? ? unless ? ? = ? ? ? ? ? =? ? ? ? ? ? ? Bayes Theorem
Relationships ? ? = ? ?,? ?? ? ? ? =? ?,? ? ? if A B, ? ? ? =? ? ? ? ? ? ? ?
Handling the Difference Joint probabilities can never be larger than marginal P(A,B) P(A) Imposing a condition can vastly change the answer Large P(A|B) says nothing about P(A) A A B B
Law of Total Probability ? ? = ? ? ? ? ? + ? ? ??? ?? A + Ac forms a partition of sample space B can be in A, Ac or both (but nothing else). A A B B A Ac c
Law of Total Probability ? ? 1 = ? ?? ? ? = ? ? ??? ?? ?=1 ? ??,?? = 0 ? ? ?=1 A A B B1 1 B B2 2 B B3 3 B B4 4
Coincident Frequency Analysis e.g. EM1110-2-1413 A and B have some joint distribution C = f(A, B) What is the distribution of C?
Conditional Moments Start with a joint distribution of X and Y Given an observation X = x, what is the behavior of Y?
Conditional Moments Conditional expectation (mean) ? ?|? = ? Conditional variance ??? ?|? = ? These play an important role in regression! Conditional mean is estimated by the best-fit line Our treatment of conditional variance affects what kind of regression model we build
Simple Linear Regression Response variable is normally-distributed, with: ? ?|? = ? = ?0+ ?1? and ??? ?|? = ? = ?2 https://saylordotorg.github.io/text_introductory-statistics/s14-03-modelling- linear-relationships.html
Generalized Linear Models Response variable has any distribution Parameters or moments a function of the predictors ? ?|? = ? = ? ? ??? ?|? = ? = ?(?)
Law of Total Expectation Weighted average under a number of conditions ? ? = ? ? ?|? As long as B forms a partition of the sample space, ? ? ? = ? ?|??? ?? ?=1 Variance (and covariance) have similar rules.
Summary Marginal, conditional, and joint probability are used to express the relationship between events Conditional moments express the behavior of one random variable given information about another
Multivariate Analysis Part II: Correlation Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center
Review: Variance Recall the variance formula: ?? 1 ? 2= ?? ?2 ? 1 ?=1
Covariance How much do two variables vary from their mean at the same time? Notated ??? ?,? or ??? ???= ?? ? ?? ? Note: ??? ?,? = ??? ? , ???= ?? Has units of (unit of x) * (unit of y) No bounds on range of values 1 ? ? 1 ?=1 2
Three Ways to Measure Correlation Pearson s product-moment correlation coefficient (r) Kendall s rank correlation coefficient ( ) Spearman s rank correlation coefficient ( ) Range of -1 (perfect negative correlation) to +1 (perfect positive correlation)
Pearsons Correlation Coefficient Measurement of linear correlation between two variables Performs very badly in non-linear situations Normalization of covariance between variables ??? ?,? ??,?= ??? ? ??? ? ?? ? ?? ? ?? ?2 ?? ?2 ???= R: cor(x) x is a data frame with 2 or more columns
Kendalls Tau Measurement of ordinal association between variables 2 ? ? 1 ?<???? ?? ????? ?? ?? = Term is -1 if both x and y do not increase (or decrease), +1 otherwise Special modifications for ties (xi = xj or yi = yj) R: cor(x, method = kendall ) x is a data frame with 2 or more columns
Spearmans Rho Pearson s correlation on rank-transform of the data Assesses monotonic relationships whether or not they are linear 2 ? ?2 1 where ??= ???? ?? ???? ?? Special modifications for ties 6 ?? ? = 1 R: cor(x, method = spearman ) x is a data frame with 2 or more columns
Product-Moment vs Rank Correlation Monotonic Non-Linear Kendall and Spearman have similar performance
Summary Covariance is a multivariate extension of variance Correlation is normalized covariance Pearson s coeff. measures linear correlation Kendall/Spearman coeff. are more robust
Multivariate Analysis Part III: Joint Probability Distributions and Copulas Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center
Joint Probability Distributions Probability distributions for multiple variables are less common Must describe the behavior of each variable and the relationship between them A sample of analytical models: Multivariate normal Multivariate t Dirichlet Wishart Nobody* uses these *Except Bayesians
Multivariate Normal Univariate normal has two parameters: ? ?,?2 is the mean 2 is the variance Bivariate normal also has two parameters: ?2?,? is a length-2 vector of means is a 2x2 matrix called the covariance matrix Extension to arbitrary dimension: ???,?
Multivariate Normal Random samples taken from a k-variate normal are collection of vectors with covariance defined by ?1,?2 ?1,?2 ?1,?2 ?1, ?2 = ? ?1,?2 ??? ?1,?2 = ? ?1,?2 ?1,?2 ?1,?2 Bivariate Normal k = 2
Multivariate Normal Mean vector ? = ? ?1,? ?2, ,? ?? Covariance matrix (k = 3) ??? ?1 ??? ?1,?2 ??? ?1,?2 ??? ?1,?3 ??? ?2,?3 A sample is ?1,?2, ,?? With ?1~? ? ?1,??? ?1 , ?2~? ? ?2,??? ?2 , etc. ? ??? ?1,?3 ??? ?2,?3 ??? ?3 ? = ??? ?2 symmetrical ?
Multivariate Standard Normal Mean vector ? = 0,0, ,0? Covariance matrix (k = 3) 1 ??? ?1,?2 ??? ?1,?3 A sample is ?1,?2, ,?? With ?1~? 0,1 , etc. ??? ?1,?2 1 ??? ?2,?3 ??? ?1,?3 ??? ?2,?3 1 ? = symmetrical ?
Others Multivariate t distribution Multivariate version of Student s t distribution Similar to multivariate normal with extra parameter (degrees of freedom) Dirichlet distribution Multivariate version of the beta distribution Wishart distribution Multivariate version of the gamma distribution
Sklars Theorem Every multivariate CDF can be expressed in terms of: Its marginals ??? and A copula, ? Consequences: Each variable can have whatever distribution you want Defined by the marginals You can specify a way to make the variables relate Defined by the copula
Copulas Gaussian Copula Structure uses the multivariate normal distribution Familiar elliptical shape Dominantly defined by Pearson s coefficient Archimedean Copulas Other interesting shapes Single parameter Definable using Kendall s tau
Gaussian Copula Assume underlying structure well-represented with a multivariate normal distribution However, variables of interest do not have a normal distribution Generate normal variates with specified correlation Transform them into the target distributions HEC-WAT Hydrologic Sampler uses this
Vector of U(0, 1) Sampled Vector ?1 ?? ?1 ?? ? ? Standard Normal CDF Gaussian Copula ?1 1?1;?1 1??;?? ?? Vector of Marginal Quantile Functions Multivariate Standard Normal ?1 ?? Vector of Values with Desired Target Distributions
? ? Standard Normal CDF
Variable A Density B Variable B Density Joint Distribution Gaussian Copula r = 0.9 A
Summary The multivariate normal distribution is the most common multivariate distribution Copulas let us define a joint distribution made from arbitrary marginals and a joining structure