Multivariate Analysis

Multivariate Analysis
Part I: Marginal, Conditional, and Joint Probability
Gregory S. Karlovits, P.E., PH, CFM
US Army Corps of Engineers
Hydrologic Engineering Center
 
Notation: Probability
 
Marginal
P(A)
Joint
P(A,B) or P(A and B) or P(A ∩ B)
Conditional
P(A|B)
Marginal Probability
What is the probability of A occurring irrespective of what happens
with B?
Joint Probability
What is the probability of A and B occurring together?
Joint Probability
 
Beware the Conjunction Fallacy
Beware the Conjunction Fallacy
Independence
 
A and B are independent iff:
P(A,B) = P(A) * P(B)
 
Then we write A ⊥ B
 
Not easy to show on a Venn Diagram
Conditional Probability
 
Given that B has occurred, what is the probability that A occurs?
Once we have observed that B has occurred, it becomes our
“universe”
Conditional Probability
 
Bayes’ Theorem
Bayes’ Theorem
Relationships
Handling the Difference
 
Joint probabilities can never be larger than marginal
P(A,B) ≤ P(A)
Imposing a condition can vastly change the answer
Large P(A|B) says nothing about P(A)
Law of Total Probability
Law of Total Probability
Coincident Frequency Analysis
e.g. EM1110-2-1413
A and B have some joint distribution
C = f(A, B)
What is the distribution
of C?
Conditional
Moments
Start with a joint
distribution of
X and Y
Given an observation X =
x, what is the behavior of
Y?
Conditional Moments
Simple Linear Regression
https://saylordotorg.github.io/text_introductory-statistics/s14-03-modelling-
linear-relationships.html
Generalized Linear Models
Law of Total Expectation
 
Weighted average under a number of conditions
 
 
As long as B forms a partition of the sample space,
 
 
 
Variance (and covariance) have similar rules.
Summary
Marginal, conditional, and joint probability are used to express
the relationship between events
Conditional moments express the behavior of one random
variable given information about another
Multivariate Analysis
Part II: Correlation
Gregory S. Karlovits, P.E., PH, CFM
US Army Corps of Engineers
Hydrologic Engineering Center
 
Review: Variance
Covariance
Three Ways to Measure Correlation
 
Pearson’s product-moment correlation coefficient (
r
)
Kendall’s rank correlation coefficient (
τ
)
Spearman’s rank correlation coefficient (
ρ
)
 
Range of -1 (perfect negative correlation) to +1 (perfect positive
correlation)
Pearson’s Correlation Coefficient
 
R:
cor(x)
x is a data frame with 2 or more columns
Kendall’s Tau
 
Term is -1 if both x and y do
Term is -1 if both x and y do
not increase (or decrease), +1
not increase (or decrease), +1
otherwise
otherwise
 
R:
cor(x, method = “kendall”)
x is a data frame with 2 or more columns
Spearman’s Rho
 
R:
cor(x, method = “spearman”)
x is a data frame with 2 or more columns
Product-Moment vs Rank Correlation
Kendall and Spearman have
Kendall and Spearman have
similar performance
similar performance
Monotonic
Monotonic
Non-Linear
Non-Linear
Summary
Covariance is a multivariate extension of variance
Correlation is normalized covariance
Pearson’s coeff. measures linear correlation
Kendall/Spearman coeff. are more robust
Multivariate Analysis
Part III: Joint Probability Distributions and Copulas
Gregory S. Karlovits, P.E., PH, CFM
US Army Corps of Engineers
Hydrologic Engineering Center
 
Joint Probability Distributions
 
Probability distributions for multiple variables are less common
Must describe the behavior of each variable and the relationship
between them
A sample of analytical models:
Multivariate normal
Multivariate t
Dirichlet
Wishart
 
Nobody* uses these
Nobody* uses these
 
*Except Bayesians
*Except Bayesians
Multivariate Normal
Multivariate Normal
Random samples taken from a 
k
-variate normal are collection of
vectors with covariance defined by 
Σ
 
Bivariate Normal
Bivariate Normal
k
k
 = 2
 = 2
Multivariate Normal
 
symmetrical
symmetrical
Multivariate Standard Normal
 
symmetrical
symmetrical
Others
 
Multivariate t distribution
Multivariate version of Student’s t distribution
Similar to multivariate normal with extra parameter 
ν
 (degrees of
freedom)
Dirichlet distribution
Multivariate version of the beta distribution
Wishart distribution
Multivariate version of the gamma distribution
Sklar’s Theorem
Copulas
 
Gaussian Copula
Structure uses the multivariate normal distribution
Familiar elliptical shape
Dominantly defined by Pearson’s coefficient
Archimedean Copulas
Other interesting shapes
Single parameter
Definable using Kendall’s tau
Gaussian Copula
 
Assume underlying structure well-represented with a
multivariate normal distribution
However, variables of interest 
do not have a normal distribution
Generate normal variates with specified correlation
Transform them into the target distributions
 
HEC-WAT Hydrologic Sampler uses this
HEC-WAT Hydrologic Sampler uses this
 
Multivariate
Multivariate
Standard Normal
Standard Normal
 
Next Sample
Next Sample
Gaussian Copula
Gaussian Copula
Variable A
Variable A
Density
Density
Variable B
Variable B
Density
Density
 
Joint Distribution
Joint Distribution
Gaussian Copula
Gaussian Copula
r = 0.9
r = 0.9
 
A
A
 
B
B
Summary
The multivariate normal distribution is the most common
multivariate distribution
Copulas let us define a joint distribution made from arbitrary
marginals and a joining structure
Slide Note

Hello everyone, I'm Greg Karlovits from the Hydrologic Engineering Center. Welcome to our course on statistical methods in hydrology. This video is part one of three on the topic of multivariate analysis and will discuss marginal, conditional, and joint probability. Let's get started.

Embed
Share

Explore the key concepts of marginal, conditional, and joint probability in multivariate analysis, as well as the notion of independence and Bayes' Theorem. Learn how these probabilities relate to each other and the importance of handling differences in joint and marginal probabilities.

  • Multivariate analysis
  • Probability
  • Relationships
  • Conditional probability
  • Independence

Uploaded on Apr 18, 2024 | 5 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Multivariate Analysis Part I: Marginal, Conditional, and Joint Probability Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center

  2. Notation: Probability Marginal P(A) Joint P(A,B) or P(A and B) or P(A B) Conditional P(A|B)

  3. Marginal Probability What is the probability of A occurring irrespective of what happens with B? A A B B

  4. Joint Probability What is the probability of A and B occurring together? A A B B

  5. Joint Probability ? ?,? ??? ? ? ,? ? A B Beware the Conjunction Fallacy

  6. Independence A and B are independent iff: P(A,B) = P(A) * P(B) Then we write A B Not easy to show on a Venn Diagram

  7. Conditional Probability Given that B has occurred, what is the probability that A occurs? Once we have observed that B has occurred, it becomes our universe A Ac c A A A AB B B B

  8. Conditional Probability ? ? ? ? ? ? unless ? ? = ? ? ? ? ? =? ? ? ? ? ? ? Bayes Theorem

  9. Relationships ? ? = ? ?,? ?? ? ? ? =? ?,? ? ? if A B, ? ? ? =? ? ? ? ? ? ? ?

  10. Handling the Difference Joint probabilities can never be larger than marginal P(A,B) P(A) Imposing a condition can vastly change the answer Large P(A|B) says nothing about P(A) A A B B

  11. Law of Total Probability ? ? = ? ? ? ? ? + ? ? ??? ?? A + Ac forms a partition of sample space B can be in A, Ac or both (but nothing else). A A B B A Ac c

  12. Law of Total Probability ? ? 1 = ? ?? ? ? = ? ? ??? ?? ?=1 ? ??,?? = 0 ? ? ?=1 A A B B1 1 B B2 2 B B3 3 B B4 4

  13. Coincident Frequency Analysis e.g. EM1110-2-1413 A and B have some joint distribution C = f(A, B) What is the distribution of C?

  14. Conditional Moments Start with a joint distribution of X and Y Given an observation X = x, what is the behavior of Y?

  15. Conditional Moments Conditional expectation (mean) ? ?|? = ? Conditional variance ??? ?|? = ? These play an important role in regression! Conditional mean is estimated by the best-fit line Our treatment of conditional variance affects what kind of regression model we build

  16. Simple Linear Regression Response variable is normally-distributed, with: ? ?|? = ? = ?0+ ?1? and ??? ?|? = ? = ?2 https://saylordotorg.github.io/text_introductory-statistics/s14-03-modelling- linear-relationships.html

  17. Generalized Linear Models Response variable has any distribution Parameters or moments a function of the predictors ? ?|? = ? = ? ? ??? ?|? = ? = ?(?)

  18. Law of Total Expectation Weighted average under a number of conditions ? ? = ? ? ?|? As long as B forms a partition of the sample space, ? ? ? = ? ?|??? ?? ?=1 Variance (and covariance) have similar rules.

  19. Summary Marginal, conditional, and joint probability are used to express the relationship between events Conditional moments express the behavior of one random variable given information about another

  20. Multivariate Analysis Part II: Correlation Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center

  21. Review: Variance Recall the variance formula: ?? 1 ? 2= ?? ?2 ? 1 ?=1

  22. Covariance How much do two variables vary from their mean at the same time? Notated ??? ?,? or ??? ???= ?? ? ?? ? Note: ??? ?,? = ??? ? , ???= ?? Has units of (unit of x) * (unit of y) No bounds on range of values 1 ? ? 1 ?=1 2

  23. Three Ways to Measure Correlation Pearson s product-moment correlation coefficient (r) Kendall s rank correlation coefficient ( ) Spearman s rank correlation coefficient ( ) Range of -1 (perfect negative correlation) to +1 (perfect positive correlation)

  24. Pearsons Correlation Coefficient Measurement of linear correlation between two variables Performs very badly in non-linear situations Normalization of covariance between variables ??? ?,? ??,?= ??? ? ??? ? ?? ? ?? ? ?? ?2 ?? ?2 ???= R: cor(x) x is a data frame with 2 or more columns

  25. Kendalls Tau Measurement of ordinal association between variables 2 ? ? 1 ?<???? ?? ????? ?? ?? = Term is -1 if both x and y do not increase (or decrease), +1 otherwise Special modifications for ties (xi = xj or yi = yj) R: cor(x, method = kendall ) x is a data frame with 2 or more columns

  26. Spearmans Rho Pearson s correlation on rank-transform of the data Assesses monotonic relationships whether or not they are linear 2 ? ?2 1 where ??= ???? ?? ???? ?? Special modifications for ties 6 ?? ? = 1 R: cor(x, method = spearman ) x is a data frame with 2 or more columns

  27. Product-Moment vs Rank Correlation Monotonic Non-Linear Kendall and Spearman have similar performance

  28. Summary Covariance is a multivariate extension of variance Correlation is normalized covariance Pearson s coeff. measures linear correlation Kendall/Spearman coeff. are more robust

  29. Multivariate Analysis Part III: Joint Probability Distributions and Copulas Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center

  30. Joint Probability Distributions Probability distributions for multiple variables are less common Must describe the behavior of each variable and the relationship between them A sample of analytical models: Multivariate normal Multivariate t Dirichlet Wishart Nobody* uses these *Except Bayesians

  31. Multivariate Normal Univariate normal has two parameters: ? ?,?2 is the mean 2 is the variance Bivariate normal also has two parameters: ?2?,? is a length-2 vector of means is a 2x2 matrix called the covariance matrix Extension to arbitrary dimension: ???,?

  32. Multivariate Normal Random samples taken from a k-variate normal are collection of vectors with covariance defined by ?1,?2 ?1,?2 ?1,?2 ?1, ?2 = ? ?1,?2 ??? ?1,?2 = ? ?1,?2 ?1,?2 ?1,?2 Bivariate Normal k = 2

  33. Multivariate Normal Mean vector ? = ? ?1,? ?2, ,? ?? Covariance matrix (k = 3) ??? ?1 ??? ?1,?2 ??? ?1,?2 ??? ?1,?3 ??? ?2,?3 A sample is ?1,?2, ,?? With ?1~? ? ?1,??? ?1 , ?2~? ? ?2,??? ?2 , etc. ? ??? ?1,?3 ??? ?2,?3 ??? ?3 ? = ??? ?2 symmetrical ?

  34. Multivariate Standard Normal Mean vector ? = 0,0, ,0? Covariance matrix (k = 3) 1 ??? ?1,?2 ??? ?1,?3 A sample is ?1,?2, ,?? With ?1~? 0,1 , etc. ??? ?1,?2 1 ??? ?2,?3 ??? ?1,?3 ??? ?2,?3 1 ? = symmetrical ?

  35. Others Multivariate t distribution Multivariate version of Student s t distribution Similar to multivariate normal with extra parameter (degrees of freedom) Dirichlet distribution Multivariate version of the beta distribution Wishart distribution Multivariate version of the gamma distribution

  36. Sklars Theorem Every multivariate CDF can be expressed in terms of: Its marginals ??? and A copula, ? Consequences: Each variable can have whatever distribution you want Defined by the marginals You can specify a way to make the variables relate Defined by the copula

  37. Copulas Gaussian Copula Structure uses the multivariate normal distribution Familiar elliptical shape Dominantly defined by Pearson s coefficient Archimedean Copulas Other interesting shapes Single parameter Definable using Kendall s tau

  38. Gaussian Copula Assume underlying structure well-represented with a multivariate normal distribution However, variables of interest do not have a normal distribution Generate normal variates with specified correlation Transform them into the target distributions HEC-WAT Hydrologic Sampler uses this

  39. Vector of U(0, 1) Sampled Vector ?1 ?? ?1 ?? ? ? Standard Normal CDF Gaussian Copula ?1 1?1;?1 1??;?? ?? Vector of Marginal Quantile Functions Multivariate Standard Normal ?1 ?? Vector of Values with Desired Target Distributions

  40. ? ? Standard Normal CDF

  41. Variable A Density B Variable B Density Joint Distribution Gaussian Copula r = 0.9 A

  42. Summary The multivariate normal distribution is the most common multivariate distribution Copulas let us define a joint distribution made from arbitrary marginals and a joining structure

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#