Extreme Value Theory in Civil Engineering

Extreme Value Theory
Part I: Introduction
Gregory S. Karlovits, P.E., PH, CFM
US Army Corps of Engineers
Hydrologic Engineering Center
 
Extreme Value Theory
 
Civil engineering is largely a practice of extremes
minimum
 shear strength and 
maximum
 shear stress for slope stability
safety factor
minimum
 resistance and 
maximum
 load for LRFD
minimum
 time to collision and 
maximum
 traffic load
minimum
 and 
maximum
 annual flows on a river
 
2
Emil Julius Gumbel
3
Gumbel’s EV Questions
 
Does an individual observation in a sample taken from a
distribution, alleged to be known, fall outside what may
reasonably be expected?
Does a series of extreme values exhibit a regular behavior?
4
Extreme Value Theory
 
We seek models for the behaviors of extremes
Models are applied for making decisions
Floods and droughts (hydrologic extremes) drive investment
5
Lecture Outline
1.
Order Statistics
2.
First Extreme Value Theorem
3.
Second Extreme Value Theorem
6
Extreme Value Theory
Part II: Order Statistics and More
Gregory S. Karlovits, P.E., PH, CFM
US Army Corps of Engineers
Hydrologic Engineering Center
 
Order Statistics
Sample
Sample
n = 10
n = 10
 
Sorted
Sorted
 
Order Statistics
Order Statistics
 
Sample minimum
 
Sample maximum
 
Sample median
 
n odd
 
n even
 
is an order statistic
 
is 
not
 an order statistic
 
Sample range
8
 
8
8
 
24
24
 
16
16
 
15.5
15.5
Rank Plot
9
 
We can show the cumulative distribution for the sample based on the ranks, but how well do we think it
reflects the population?
 
Do you think that another sample from this population will never exceed 24?
Order Statistics of the Uniform
Distribution
 
The exceedance probabilities of a random sample of values from any
population have a uniform distribution bounded on the interval (0, 1).
 
Why is this useful?
We often are interested in an empirical estimate for the exceedance
probabilities for values in a sample.
10
 
U
U
(0, 1)
(0, 1)
Order Statistics of the Uniform
Distribution
 
Proof:
 
1,000 samples from a standard
normal distribution
 
Histogram of cumulative
probabilities
11
 
CDF
CDF
Order Statistics of the Uniform
Distribution
 
Why care?
The same order statistic in repeated samples
from the same population will not always have
the same exceedance probability.
 
Let’s examine the distribution of F(x
(n)
)
.
Generate a sample from the standard normal
distribution of size 100
Find the cumulative probability for sample
maximum x
(100)
 (= x
(n)
) using CDF
Repeat a large number of times
Plot the empirical density
12
Order Statistics of the Uniform
Distribution
 
What does this show?
The sample maximum has an uncertain
exceedance probability.
 
In fact, the exceedance probability has a
probability distribution, the 
beta distribution
,
with parameters computed from the rank and
the sample size.
 
Mean
Mean
 
Median
Median
 
Beta(100, 1)
Beta(100, 1)
Density
Density
13
Order Statistics of the Uniform
Distribution
 
approx for median of
approx for median of
beta dist
beta dist
 
exact mean of beta
exact mean of beta
dist
dist
14
Order Statistics of the Uniform
Distribution
 
Other consequences?
We can compute confidence limits for
empirical exceedance probability and show just
how uncertain it is.
 
Difference in plotting position choice
matters most at tails
Uncertainty in empirical frequency large at
tails
 
symmetrical
symmetrical
uncertainty
uncertainty
 
asymmetrical and
asymmetrical and
wide
wide
15
 
90% CI
90% CI
 
x
x
(100)
(100)
 
x
x
(99)
(99)
 
x
x
(2)
(2)
 
x
x
(1)
(1)
Relation to Extreme Value Theory
 
This procedure explores the uncertainty in exceedance
probability for a sample
We can model this with the 
beta distribution
What if I want to get the distribution of the values for a
particular order statistic?
Depends on 
i
, 
n
, and also f(x)
Straightforward for x
(1)
 and x
(n)
16
Summary
Order statistics look at data based on their rank
The true exceedance probability for an order statistic is
uncertain
Plotting position uncertainty can be modeled with the beta
distribution
17
Extreme Value Theory
Part III: First Extreme Value Theorem
Gregory S. Karlovits, P.E., PH, CFM
US Army Corps of Engineers
Hydrologic Engineering Center
 
How do the extremes vary?
 
Usually we are most interested in f(x
(n)
)
Repeated samples, 
n
 = 10 from a population:
19
 
distribution of these
distribution of these
Block Maxima
 
Non-overlapping groups
Equal size
20
 
sample 1
sample 1
 
sample 2
sample 2
 
sample 3
sample 3
 
sample 4
sample 4
 
sample 5
sample 5
 
sample 6
sample 6
Modelling Extremes
 
Some assumptions:
All of the values come from the same population, with possibly
unknown density function f(x; 
ϴ
)
f(x; 
ϴ
) would be the distribution for 
every day
 of flow
All of the values are taken independently
Using block maxima with big enough blocks helps ensure this
Motivation for the “water year”
We can estimate a model for f(x
(n)
)
21
Fisher-Tippett-Gnedenko Theorem
 
The distribution of the maximum of repeated samples of a
homogeneous population converge to one of three probability
distributions:
EV1: Gumbel Distribution
EV2: Fréchet Distribution
EV3: Weibull Distribution
All three distributions can be represented with a single
distribution:
Generalized Extreme Value Distribution
Generalized Extreme Value Distribution
22
 
Emil J. Gumbel
Maurice R. Fréchet
Waloddi Weibull
Namesake
23
Central Limit Theorem
 
Think of this as the central limit theorem (CLT) except for
maxima:
CLT states that the sample average of repeated draws of size 
n
from a population converges to a normal distribution
 
 
 
 
First EV Theorem states that the maximum of repeated draws of
size 
n
 from a population converges to a GEV distribution
24
25
GEV Distribution
Convergence
 
EV convergence depends on three things:
The distribution of the parent population f(x; 
θ
)
Changes to which EV distribution samples converge
The number of events per block 
n
The number of blocks forming the estimate
How good is the estimate of the GEV parameters?
26
Convergence: Maximum Domain of
Attraction
 
EV1: Gumbel (GEV 
κ
 = 0)
f(x; 
ϴ
) in the exponential family
EV2: Fréchet Distribution (GEV 
κ
 < 0)
f(x; 
ϴ
) is heavy-tailed
EV3: Weibull Distribution (GEV 
κ
 > 0)
f(x; 
ϴ
) is light-tailed or upper-bounded
27
Convergence of Block Maxima
Block size
Block size
(maximum of
(maximum of
“this many”)
“this many”)
28
Convergence:
Why isn’t this AMS GEV?
 
Three primary factors delaying convergence:
Too few independent events per block
Too few blocks (years) creating AMS
Inhomogeneous parent population
29
Parent population
Parent population
of events
of events
Line of
Line of
convergence
convergence
to GEV
to GEV
Green: Light tails
Green: Light tails
Red:   Heavy tails
Red:   Heavy tails
Blue:  Exponential
Blue:  Exponential
 
When 
n
 is small,
Convergence – Annual Maximum
Streamflow
 
Although we take the maximum of 365 days of flows, what we
care about most are floods
Few real floods happen each year at most sites
Some years don’t have any events we would consider “floods”
Streamflow records are mixed and serially correlated
Effectively we are taking the maxima of far fewer than 365
events
Bottom line: Bulletin 17 procedures do not generally meet the
assumptions required of EVT analysis
30
31
Inland Flood Hazards: Human, Riparian, and Aquatic Communities (Wohl, 2011)
Convergence – Precipitation
 
Conversely, it is much easier to isolate independent rainfall
events of the same causal mechanism
Example: easy to identify which rainfall events in Florida are caused by
tropical storms
Eliminates mixtures
Some types of storms occur several times per year at some locations
Rainfall is much easier to analyze in traditional EVT manner
Plus, regionalization is much easier than for streamflow
32
Summary
The first extreme value theorem provides a model for the
magnitude of block maxima
Annual maximum series tend to converge to the generalized
extreme value distribution
Several issues can prevent sample convergence to GEV
33
Extreme Value Theory
Part IV: Second Extreme Value Theorem
Gregory S. Karlovits, P.E., PH, CFM
US Army Corps of Engineers
Hydrologic Engineering Center
 
Peaks Over Threshold
 
Block maximum approach
“throws out” data
Some blocks have small maxima
Smaller than non-maxima in other
blocks
What if we consider 
independent
local maxima?
35
Threshold
Threshold
 
Block Maxima
Block Maxima
Local Maxima
Local Maxima
Peaks Over Threshold
 
Zero or more local maxima per block
Count of peaks needs to be considered
Need to ensure local maxima sufficiently independent
No longer dealing with order statistics
Cumulative probability ≠ 1 – AEP
36
 
We are interested in the distribution of 
excesses
:
Given a set of values that exceed a threshold,
What is the distribution of the excess?
The collection of peaks is sometimes called a 
partial duration series
(PDS)
Peaks Over Threshold
37
Pickands-Balkema-de Haan Theorem
38
Vilfredo Pareto
Namesake
39
40
Generalized Pareto Distribution
41
Peaks Over Threshold
 
Real-life challenges:
Choosing a threshold
Ensuring peaks are independent
Difference between AMS and PDS results may be small
42
A Fishing Example
 
You are fishing in a lake over several days.
Assuming you catch the fish at random,
The parent population f(x; 
ϴ
) is the length of all fish in the lake
You catch an average of 
λ
 fish each day
λ
 = total fish / total days
The largest fish you catch each day is asymptotically GEV-distributed
The length of all of your keepers is asymptotically generalized Pareto-
distributed (GPA)
43
A Fishing Example
 
Your fish samples may not look GEV/GPA because:
There is a mixture of fish species in the lake
f(x; 
ϴ
) is not homogeneous!
The fish you catch probably aren’t independent
The number of fish you catch per day (
λ
) is small
You are only fishing for a few days
44
Going Between AMS and PDS
45
 
See Langbein (1949)
See Langbein (1949)
Langbein Adjustment
46
 
Test equivalence of x = 3
 
GPA fit to PDS
 
Sample value
 
Langbein adj.
 
GEV fit to AMS
EV type I axis
EV type I axis
Madsen et al. 1997
47
Tools and Resources for POT
Most POT analysis currently performed in R
Improved POT analysis in HEC-SSP coming in 2024!
Materials from 2023 Partial Duration Workshop at Los Angeles
District available online with HEC-SSP training
48
Summary
The second extreme value theorem provides a model for all
independent extremes above a threshold
Partial duration series tend to converge to the generalized
Pareto distribution
Annualized estimates can be made from PDS models
49
Slide Note

Hello everyone, I'm Greg Karlovits from the Hydrologic Engineering Center. Welcome to our course on statistical methods in hydrology. This video is part one of four on the topic of extreme value theory and will cover a short introduction to extreme value theory. Let's get started.

Embed
Share

Introduction to Extreme Value Theory (EVT) in civil engineering focusing on the analysis of extremes such as shear strength, slope stability, and load factors. The theory, exemplified by Emil Julius Gumbel, questions the likelihood of individual observations falling outside expected distributions. EVT models behaviors of extremes to inform decision-making in managing hydrologic events like floods and droughts. Lectures cover order statistics, extreme value theorems, and the application of EVT models in engineering practices.

  • Extreme Value Theory
  • Civil Engineering
  • Order Statistics
  • Hydrologic Events
  • Gumbel

Uploaded on Apr 20, 2024 | 11 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Extreme Value Theory Part I: Introduction Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center

  2. Extreme Value Theory Civil engineering is largely a practice of extremes minimum shear strength and maximum shear stress for slope stability safety factor minimum resistance and maximum load for LRFD minimum time to collision and maximum traffic load minimum and maximum annual flows on a river 2

  3. Emil Julius Gumbel It s impossible that the improbable will never happen. 3

  4. Gumbels EV Questions Does an individual observation in a sample taken from a distribution, alleged to be known, fall outside what may reasonably be expected? Does a series of extreme values exhibit a regular behavior? 4

  5. Extreme Value Theory We seek models for the behaviors of extremes Models are applied for making decisions Floods and droughts (hydrologic extremes) drive investment 5

  6. Lecture Outline 1. Order Statistics 2. First Extreme Value Theorem 3. Second Extreme Value Theorem 6

  7. Extreme Value Theory Part II: Order Statistics and More Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center

  8. Order Statistics Sample minimum ?1 8 Sample n = 10 24 Sample maximum ?? Sorted Order Statistics X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 19 20 8 15 9 22 24 16 14 12 24 22 20 19 16 15 14 12 9 8 X(10) X(9) X(8) X(7) X(6) X(5) X(4) X(3) X(2) X(1) Sample range ?? ?1 16 Sample median n odd 15.5 ? ?+1 is an order statistic 2 n even ? ? + ? ?+1 is not an order statistic 2 2 2 8

  9. Rank Plot We can show the cumulative distribution for the sample based on the ranks, but how well do we think it reflects the population? Do you think that another sample from this population will never exceed 24? 9

  10. Order Statistics of the Uniform Distribution The exceedance probabilities of a random sample of values from any population have a uniform distribution bounded on the interval (0, 1). U(0, 1) Why is this useful? We often are interested in an empirical estimate for the exceedance probabilities for values in a sample. 10

  11. Order Statistics of the Uniform Distribution Proof: CDF 1,000 samples from a standard normal distribution Histogram of cumulative probabilities 11

  12. Order Statistics of the Uniform Distribution Why care? The same order statistic in repeated samples from the same population will not always have the same exceedance probability. Let s examine the distribution of F(x(n)). Generate a sample from the standard normal distribution of size 100 Find the cumulative probability for sample maximum x(100) (= x(n)) using CDF Repeat a large number of times Plot the empirical density 12

  13. Order Statistics of the Uniform Distribution What does this show? The sample maximum has an uncertain exceedance probability. In fact, the exceedance probability has a probability distribution, the beta distribution, with parameters computed from the rank and the sample size. Median Mean Beta(100, 1) Density ? ?? ~???? ?,? + 1 ? 13

  14. Order Statistics of the Uniform Distribution What is the consequence? We just derived two very important plotting position estimators: F(x(n)), n = 100, 100,000 samples Property Sample Exact Median 0.9931 0.9932 The median plotting position ? ? ?,? =? 0.3175 approx for median of beta dist Mean 0.9901 0.9901 ? + 0.365 The Weibull (mean) plotting position ? ? ? ?,? = exact mean of beta dist ? + 1 14

  15. Order Statistics of the Uniform Distribution asymmetrical and wide Other consequences? We can compute confidence limits for empirical exceedance probability and show just how uncertain it is. x(100) x(99) 90% CI Difference in plotting position choice matters most at tails Uncertainty in empirical frequency large at tails symmetrical uncertainty x(2) x(1) 15

  16. Relation to Extreme Value Theory This procedure explores the uncertainty in exceedance probability for a sample We can model this with the beta distribution What if I want to get the distribution of the values for a particular order statistic? Depends on i, n, and also f(x) Straightforward for x(1) and x(n) 16

  17. Summary Order statistics look at data based on their rank The true exceedance probability for an order statistic is uncertain Plotting position uncertainty can be modeled with the beta distribution 17

  18. Extreme Value Theory Part III: First Extreme Value Theorem Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center

  19. How do the extremes vary? Usually we are most interested in f(x(n)) Repeated samples, n = 10 from a population: Sample Number 6 26 26 24 94 62 46 74 99 6 67 84 81 79 7 15 71 33 43 44 35 84 99 1 48 92 4 76 63 35 27 87 12 80 92 2 25 48 26 13 50 81 92 96 77 85 96 3 40 44 6 88 92 14 63 66 37 13 92 4 74 80 17 47 33 62 82 21 10 32 82 5 7 76 79 23 56 44 9 72 61 45 31 79 8 64 32 68 2 87 86 72 37 57 65 87 9 74 13 44 88 91 26 98 99 15 15 99 10 80 48 9 29 8 35 4 74 96 31 96 11 95 90 24 15 26 87 33 61 66 47 95 12 76 31 70 93 8 94 72 65 5 39 94 1 2 3 4 5 6 7 8 9 10 Max Sample Value distribution of these 19

  20. Block Maxima Non-overlapping groups Equal size sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 20

  21. Modelling Extremes Some assumptions: All of the values come from the same population, with possibly unknown density function f(x; ) f(x; ) would be the distribution for every day of flow All of the values are taken independently Using block maxima with big enough blocks helps ensure this Motivation for the water year We can estimate a model for f(x(n)) 21

  22. Fisher-Tippett-Gnedenko Theorem The distribution of the maximum of repeated samples of a homogeneous population converge to one of three probability distributions: EV1: Gumbel Distribution EV2: Fr chet Distribution EV3: Weibull Distribution All three distributions can be represented with a single distribution: Generalized Extreme Value Distribution 22

  23. Namesake Emil J. Gumbel Maurice R. Fr chet Waloddi Weibull 23

  24. Central Limit Theorem Think of this as the central limit theorem (CLT) except for maxima: CLT states that the sample average of repeated draws of size n from a population converges to a normal distribution ? ??=?1+ + ?? 1?? ?? ?? ? First EV Theorem states that the maximum of repeated draws of size n from a population converges to a GEV distribution 24

  25. GEV Distribution ? ?; , , = ? ? ? 1??? 1 ? 0 ? = ? = 0 25

  26. Convergence EV convergence depends on three things: The distribution of the parent population f(x; ) Changes to which EV distribution samples converge The number of events per block n The number of blocks forming the estimate How good is the estimate of the GEV parameters? lim ? ? ?? = GEV Distribution 26

  27. Convergence: Maximum Domain of Attraction EV1: Gumbel (GEV = 0) f(x; ) in the exponential family EV2: Fr chet Distribution (GEV < 0) f(x; ) is heavy-tailed EV3: Weibull Distribution (GEV > 0) f(x; ) is light-tailed or upper-bounded 27

  28. Convergence of Block Maxima Block size (maximum of this many ) 28

  29. Convergence: Why isn t this AMS GEV? Three primary factors delaying convergence: Too few independent events per block Too few blocks (years) creating AMS Inhomogeneous parent population Parent population of events lim ? ? ?? = GEV Distribution Green: Light tails Red: Heavy tails Blue: Exponential When n is small, ? ?? ~ Kappa Distribution Line of convergence to GEV 29

  30. Convergence Annual Maximum Streamflow Although we take the maximum of 365 days of flows, what we care about most are floods Few real floods happen each year at most sites Some years don t have any events we would consider floods Streamflow records are mixed and serially correlated Effectively we are taking the maxima of far fewer than 365 events Bottom line: Bulletin 17 procedures do not generally meet the assumptions required of EVT analysis 30

  31. 31 Inland Flood Hazards: Human, Riparian, and Aquatic Communities (Wohl, 2011)

  32. Convergence Precipitation Conversely, it is much easier to isolate independent rainfall events of the same causal mechanism Example: easy to identify which rainfall events in Florida are caused by tropical storms Eliminates mixtures Some types of storms occur several times per year at some locations Rainfall is much easier to analyze in traditional EVT manner Plus, regionalization is much easier than for streamflow 32

  33. Summary The first extreme value theorem provides a model for the magnitude of block maxima Annual maximum series tend to converge to the generalized extreme value distribution Several issues can prevent sample convergence to GEV 33

  34. Extreme Value Theory Part IV: Second Extreme Value Theorem Gregory S. Karlovits, P.E., PH, CFM US Army Corps of Engineers Hydrologic Engineering Center

  35. Peaks Over Threshold Block Maxima Block maximum approach throws out data Some blocks have small maxima Smaller than non-maxima in other blocks What if we consider independent local maxima? Local Maxima Threshold 35

  36. Peaks Over Threshold Zero or more local maxima per block Count of peaks needs to be considered Need to ensure local maxima sufficiently independent No longer dealing with order statistics Cumulative probability 1 AEP 36

  37. Peaks Over Threshold We are interested in the distribution of excesses: Given a set of values that exceed a threshold, What is the distribution of the excess? The collection of peaks is sometimes called a partial duration series (PDS) excess x u u 37

  38. Pickands-Balkema-de Haan Theorem ??? = ?? ? ? ? ? > ? =? ?+? ? ? due to conditional probability 1 ? ? lim ? ??? ? ? where ? ? is the Generalized Pareto Distribution due to the Pickands-Balkema-de Haan theorem 38

  39. Namesake Vilfredo Pareto 39

  40. 40

  41. Generalized Pareto Distribution ? ?; , , = 1 ? ? 1??? 1 ? 0 ? = ? = 0 41

  42. Peaks Over Threshold Real-life challenges: Choosing a threshold Ensuring peaks are independent Difference between AMS and PDS results may be small 42

  43. A Fishing Example You are fishing in a lake over several days. Assuming you catch the fish at random, The parent population f(x; ) is the length of all fish in the lake You catch an average of fish each day = total fish / total days The largest fish you catch each day is asymptotically GEV-distributed The length of all of your keepers is asymptotically generalized Pareto- distributed (GPA) 43

  44. A Fishing Example Your fish samples may not look GEV/GPA because: There is a mixture of fish species in the lake f(x; ) is not homogeneous! The fish you catch probably aren t independent The number of fish you catch per day ( ) is small You are only fishing for a few days 44

  45. Going Between AMS and PDS Assuming mean rate of events per year : ? ? = ? 1 ? where ? is the CDF of the PDS distribution This is usually called the Langbein Adjustment See Langbein (1949) Simulation experiment: Generate 100 years of 50 events each from a standard normal distribution Look at the distribution of the AMS Look at the distribution of the PDS See how the adjustment performs 45

  46. Langbein Adjustment Test equivalence of x = 3 GPA fit to PDS Sample value Langbein adj. GEV fit to AMS ? 3.0 = 0.9883 = 3.38 ? 3.38 1 0.9883= 0.9613 ? 10.9613 = 3.03 EV type I axis 46

  47. Madsen et al. 1997 = + ??? when = 0 = +? 1 otherwise ? = ? At ? = , the CDF of the AMS is equal to ??? which is the probability of no exceedances in a year = 47

  48. Tools and Resources for POT Most POT analysis currently performed in R Improved POT analysis in HEC-SSP coming in 2024! Materials from 2023 Partial Duration Workshop at Los Angeles District available online with HEC-SSP training 48

  49. Summary The second extreme value theorem provides a model for all independent extremes above a threshold Partial duration series tend to converge to the generalized Pareto distribution Annualized estimates can be made from PDS models 49

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#