Bell & Coins Example

1
B
e
l
l
 
&
 
C
o
i
n
s
 
E
x
a
m
p
l
e
Consider a bell that rings with high probability only when the
outcome of two coins are equal. This situation can be well
represented by a directed acyclic graph with the probability
distribution:
  
P(c
1
,c
2’
 b) = P(c
1
)  
P(c
2
)
   P(b | c
1
, c
2
)  (
8 assignments
)
 
   
   = P(c
1
) 
P(c
2
|c
1
)
 P(b | c
1
, c
2
)
Only Assumption: Coins are marginally independent
   
I
 
( coin
2
, { } ,coin
1
)
2
B
a
y
e
s
i
a
n
 
N
e
t
w
o
r
k
s
(
D
i
r
e
c
t
e
d
 
A
c
y
c
l
i
c
 
G
r
a
p
h
i
c
a
l
 
M
o
d
e
l
s
)
Definition
: A 
Bayesian Network 
is a pair (P,D) where P is
a Probability distribution over variables X
1
 ,…, X
n
 and D is
a Directed Acyclic Graph (DAG) over vertices X
1
 ,…, X
n
with the relationship:
3
B
a
y
e
s
i
a
n
 
N
e
t
w
o
r
k
s
 
(
B
N
s
)
Examples to model diseases & symptoms & risk factors
One variable for all diseases (values are diseases)
(Draw it)
One variable per disease (values are True/False)
(Draw it)
Naïve Bayesian Networks versus Bipartite BNs
(details)
Adding Risk Factors
(Draw it)
4
N
a
ï
v
e
 
B
a
y
e
s
 
f
o
r
 
D
i
s
e
a
s
e
s
 
&
S
y
m
p
t
o
m
s
Values of D are {D
1
,…D
m
} and represent non-overlapping diseases.
Values of each symptom S represent severity of symptom {0,1,2}.
5
R
i
s
k
s
,
 
D
i
s
e
a
s
e
s
,
 
S
y
m
p
t
o
m
s
Values of each Disease D
i
 are represent existence/non-existence
{True, False}.
Values of each symptom S represent severity of symptom {0,1,2}.
6
N
a
t
u
r
a
l
 
D
i
r
e
c
t
i
o
n
 
o
f
 
I
n
f
o
r
m
a
t
i
o
n
Consider our example of a hypothesis H=h indicating disease and
a set of symptoms such as fever, blood pressure, pain, etc, marked by
S
1
=s
1
, S
2
=s
2
, S
3
=s
3
 or in short E=e.
We assume P(e | h) are given (or inferred from data)
and compute P(h | e).
Natural order to specify a BN is usually according to causation or
time line. 
Nothing in the definition dictates this choice
, but
normally more independence assumptions are explicated in the
graph via this choice.
Example
: Atomic clock + watch clocks naïve Bayes model.
7
T
h
e
 
V
i
s
i
t
-
t
o
-
A
s
i
a
 
E
x
a
m
p
l
e
What are the relevant variables and their dependencies ?
8
A
n
 
a
b
s
t
r
a
c
t
 
e
x
a
m
p
l
e
u
In the order 
V,S,T,L,B,A,X,D,
we have a boundary basis:
I( S, { }, V )
I( T, V, S)
I( l, S, {T,
 
V})
I( X,A,
 {
V,S,T,L,B,D})
 
D
o
e
s
 
I
 
(
 
{
X
,
 
D
}
 
,
A
,
V
)
 
a
l
s
o
 
h
o
l
d
 
i
n
 
P
 
?
9
I
n
d
e
p
e
n
d
e
n
c
e
 
i
n
 
B
a
y
e
s
i
a
n
 
n
e
t
w
o
r
k
s
 
This set 
 
of n independence assumptions is called 
Basis(D)
 .
T
h
i
s
 
e
q
u
a
t
i
o
n
 
u
s
e
s
 
s
p
e
c
i
f
i
c
 
t
o
p
o
l
o
g
i
c
a
l
 
o
r
d
e
r
 
d
 
o
f
 
v
e
r
t
i
c
e
s
,
n
a
m
e
l
y
,
 
t
h
a
t
 
f
o
r
 
e
v
e
r
y
 
X
i
,
 
t
h
e
 
p
a
r
e
n
t
s
 
o
f
 
X
i
 
a
p
p
e
a
r
 
b
e
f
o
r
e
 
X
i
 
i
n
t
h
i
s
 
o
r
d
e
r
.
However, all choices of a topological order are equivalent.
Check for example topological orders of 
V,S,T,L of visit-to-Asia example.
 
10
D
i
r
e
c
t
e
d
 
a
n
d
 
U
n
d
i
r
e
c
t
e
d
 
C
h
a
i
n
E
x
a
m
p
l
e
Assumptions:  I
 
( X
3 
,
 
X
2
 , X
1
),   I
 
( X
4 
, X
3
,
{
X
1
, X
2
})
P(x
1
,x
2
,x
3
,x
4
) = P(x
1
) P(x
2
|x
1
) P(x
3
|x
2
) P(x
4
|x
3
)
 
    P(x
1
,x
2
,x
3
,x
4
) = P(x
1
, x
2
) P(x
3
|x
2
) P(x
4
|x
3
)
Markov network: The joint distribution is a Multiplication of functions
on three maximal cliques with normalizing factor 1.
11
R
e
m
i
n
d
e
r
:
 
M
a
r
k
o
v
 
N
e
t
w
o
r
k
s
(
U
n
d
i
r
e
c
t
e
d
 
G
r
a
p
h
i
c
a
l
 
M
o
d
e
l
s
)
Definition
: A 
Markov Network 
is a pair (P,G) where P is a
probability distribution over variables X
1
 ,…, X
n
 and G is a
undirected graph over vertices X
1
 ,…, X
n
 with the
relationship:
for all values x
i
 of X
i
 and
where  C
1
 … C
m
 are the maximal cliques of G and g
j
 are
functions that assign a non-negative number to every value
combination of the variables in C
j
. 
 
Markov Networks and Bayesian networks represent the
same set of independence assumptions only on Chordal
graphs (such as chains and trees).
12
F
r
o
m
 
S
e
p
a
r
a
t
i
o
n
 
i
n
 
U
G
s
T
o
 
d
-
S
e
p
a
r
a
t
i
o
n
 
i
n
 
D
A
G
s
13
P
a
t
h
s
u
I
n
t
u
i
t
i
o
n
:
 
d
e
p
e
n
d
e
n
c
y
 
m
u
s
t
 
f
l
o
w
 
a
l
o
n
g
 
p
a
t
h
s
 
i
n
t
h
e
 
g
r
a
p
h
u
A
 
p
a
t
h
 
i
s
 
a
 
s
e
q
u
e
n
c
e
 
o
f
 
n
e
i
g
h
b
o
r
i
n
g
 
v
a
r
i
a
b
l
e
s
Examples:
u
X 
 A 
 D 
 B
u
A
 
 L
 
 S 
 B
14
P
a
t
h
 
b
l
o
c
k
a
g
e
u
Every path is classified given the evidence:
a
c
t
i
v
e
 
-
-
 
c
r
e
a
t
e
s
 
a
 
d
e
p
e
n
d
e
n
c
y
 
b
e
t
w
e
e
n
 
t
h
e
e
n
d
 
n
o
d
e
s
b
l
o
c
k
e
d
 
 
d
o
e
s
 
n
o
t
 
c
r
e
a
t
e
 
a
 
d
e
p
e
n
d
e
n
c
y
b
e
t
w
e
e
n
 
t
h
e
 
e
n
d
 
n
o
d
e
s
Evidence means the assignment of a value to a
subset of nodes.
15
P
a
t
h
 
B
l
o
c
k
a
g
e
Three cases:
Common cause
16
P
a
t
h
 
B
l
o
c
k
a
g
e
Three cases:
Common cause
Intermediate cause
17
P
a
t
h
 
B
l
o
c
k
a
g
e
Three cases:
Common cause
Intermediate cause
Common Effect
18
D
e
f
i
n
i
t
i
o
n
 
o
f
 
P
a
t
h
 
B
l
o
c
k
a
g
e
Definition
: A path is 
active
, given evidence 
Z
, if
u
Whenever we have the configuration
then either 
A
 or one of its descendents is in 
Z
u
No other nodes in the path are in 
Z
.
Definition
: A path is 
blocked
, given evidence 
Z
, if it is not
active.
D
e
f
i
n
i
t
i
o
n
:
 
X
 
i
s
 
d
-
s
e
p
a
r
a
t
e
d
 
f
r
o
m
 
Y
,
 
g
i
v
e
n
 
Z
,
 
i
f
 
a
l
l
 
p
a
t
h
s
 
f
r
o
m
a
 
n
o
d
e
 
i
n
 
X
 
 
a
n
d
 
a
 
n
o
d
e
 
i
n
 
Y
 
 
a
r
e
 
b
l
o
c
k
e
d
,
 
g
i
v
e
n
 
Z
.
 
D
e
n
o
t
e
d
b
y
 
I
D
(
X
,
 
Z
,
 
Y
)
 
.
 
 
(
X
,
Y
,
Z
)
 
a
r
e
 
s
e
t
s
 
o
f
 
v
a
r
i
a
b
l
e
s
 
t
h
a
t
 
a
r
e
 
d
i
s
j
o
i
n
t
.
19
I
D
(T,S|
) = yes
E
x
a
m
p
l
e
 
20
I
D
 (T,S |
) = yes
I
D
(T,S|D) = no
E
x
a
m
p
l
e
21
I
D
 (T,S |
) = yes
I
D
(T,S|D) = no
I
D
(T,S|{D,L,B}) = yes
E
x
a
m
p
l
e
 
22
E
x
a
m
p
l
e
u
In the order 
V,S,T,L,B,A,X,D,
 we
get from the  boundary basis:
I
D
( S, { }, V )
I
D
( T, V, S)
I
D
( l, S, {T,
 
V})
I
D
( X,A,
 {
V,S,T,L,B,D})
23
M
a
i
n
 
r
e
s
u
l
t
s
 
o
n
 
d
-
s
e
p
a
r
a
t
i
o
n
T
h
e
 
d
e
f
i
n
i
t
i
o
n
 
o
f
 
I
D
(
X
,
 
Z
,
 
Y
)
 
 
i
s
 
s
u
c
h
 
t
h
a
t
:
 
 
S
o
u
n
d
n
e
s
s
 
[
T
h
e
o
r
e
m
 
9
]
:
 
I
D
(
X
,
 
Z
,
 
Y
)
 
=
 
y
e
s
i
m
p
l
i
e
s
 
I
P
(
X
,
 
Z
,
 
Y
)
 
f
o
l
l
o
w
s
 
f
r
o
m
 
B
a
s
i
s
(
D
)
.
 
 
C
o
m
p
l
e
t
e
n
e
s
s
 
[
T
h
e
o
r
e
m
 
1
0
]
:
 
I
D
(
X
,
 
Z
,
 
Y
)
 
=
 
n
o
i
m
p
l
i
e
s
 
I
P
(
X
,
 
Z
,
 
Y
)
 
d
o
e
s
 
n
o
t
 
f
o
l
l
o
w
 
f
r
o
m
 
B
a
s
i
s
(
D
)
.
24
R
e
v
i
s
i
t
i
n
g
 
E
x
a
m
p
l
e
 
I
I
S
o
 
d
o
e
s
 
 
I
P
(
 
{
X
,
 
D
}
 
,
A
,
 
V
)
 
 
h
o
l
d
 
?
Enough to check d-separation !
25
L
o
c
a
l
 
d
i
s
t
r
i
b
u
t
i
o
n
s
-
 
A
s
y
m
m
e
t
r
i
c
i
n
d
e
p
e
n
d
e
n
c
e
T
a
b
l
e
:
p
(
A
=y|
L=
n, 
T
=n)
 = 
0.02
p
(
A
=y|
L=
n, 
T
=y)
 = 
0.60
p
(
A
=y|
L=
y, 
T
=n)
 = 
0.99
p
(
A
=y|
L=
y, 
T
=y)
 = 
0.99
 
I
P
(A,  L=y,  T) holds
 
I
P
(A,  L=n,  T) does not hold
 
So: I
P
(A,  L,  T) does not hold
 
Independence for some values
26
I
n
d
e
p
e
n
d
e
n
c
e
 
i
n
 
B
a
y
e
s
i
a
n
 
n
e
t
w
o
r
k
s
=
 
=
28
C
l
a
i
m
 
2
:
 
E
a
c
h
 
t
o
p
o
l
o
g
i
c
a
l
 
o
r
d
e
r
 
d
 
i
n
 
a
 
B
N
 
e
n
t
a
i
l
s
 
t
h
e
s
a
m
e
 
s
e
t
 
o
f
 
i
n
d
e
p
e
n
d
e
n
c
e
 
a
s
s
u
m
p
t
i
o
n
s
.
Due to soundness of d-separation (Theorem 9) we also get:
I
P
(X
i   
,   
{X
i-1 ,
X
i+1
}
  ,   {X
1
 … X
i-2
 , X
i+2
… X
n 
})
E
x
t
e
n
s
i
o
n
 
o
f
 
t
h
e
 
M
a
r
k
o
v
 
C
h
a
i
n
 
P
r
o
p
e
r
t
y
For Markov chains we assume 
Basis(D)
:  
                           I
P
(X
i  
,  X
i-1  
,  {X
1
 … X
i-2
})
 
 
Definition: A 
Markov blanket 
of a variable X wrt probability
distribution P  is a set of  variables 
B
(X) such that
                                    I
P
(X, 
B(
X
)
, all-other-variables)
Chain Example: B(X
i
) = 
{X
i-1 ,
X
i+1
}
30
 
Proof: Consequence of soundness of d-separation (Theorem 9).
M
a
r
k
o
v
 
B
l
a
n
k
e
t
s
 
i
n
 
B
N
s
 
Slide Note
Embed
Share

Bayesian networks provide a powerful tool for modeling diseases, symptoms, and risk factors. These directed acyclic graphs help in understanding the relationships between variables and making probabilistic inferences. By utilizing Bayesian networks, it becomes feasible to represent complex scenarios such as disease manifestations, symptom severity, and the influence of various risk factors in a structured and informative manner.

  • Bayesian Networks
  • Disease Modeling
  • Symptom Severity
  • Risk Factors
  • Probabilistic Inferences

Uploaded on Feb 18, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Bell & Coins Example Coin2 Coin1 Bell Consider a bell that rings with high probability only when the outcome of two coins are equal. This situation can be well represented by a directed acyclic graph with the probability distribution: P(c1,c2 b) = P(c1) P(c2) P(b | c1, c2) (8 assignments) = P(c1) P(c2|c1) P(b | c1, c2) Only Assumption: Coins are marginally independent I( coin2, { } ,coin1) 1

  2. Bayesian Networks (Directed Acyclic Graphical Models) Definition: A Bayesian Network is a pair (P,D) where P is a Probability distribution over variables X1 , , Xn and D is a Directed Acyclic Graph (DAG) over vertices X1 , , Xn with the relationship: ? P(x1 , , xn ) = ?=1 P(xi | pai) for all values xi of Xi and where paiis the set of parents of vertex xi in D. (When Xi is a source vertex then paiis the empty set of parents and P(xi | pai) reduces to P(xi).) DAG = directed graph without any directed cycles. 2

  3. Bayesian Networks (BNs) Examples to model diseases & symptoms & risk factors One variable for all diseases (values are diseases) (Draw it) One variable per disease (values are True/False) (Draw it) Na ve Bayesian Networks versus Bipartite BNs (details) Adding Risk Factors (Draw it) 3

  4. Nave Bayes for Diseases & Symptoms D ... sk s2 s1 Values of D are {D1, Dm} and represent non-overlapping diseases. Values of each symptom S represent severity of symptom {0,1,2}. ? P(d, s1 , , sk ) = p(d) ?=1 P(si | d) 4

  5. Risks, Diseases, Symptoms R1 R2 ... D1 D2 D3 Dn ... s3 s2 sk-1 sk s1 Values of each Disease Di are represent existence/non-existence {True, False}. Values of each symptom S represent severity of symptom {0,1,2}. 5

  6. Natural Direction of Information Consider our example of a hypothesis H=h indicating disease and a set of symptoms such as fever, blood pressure, pain, etc, marked by S1=s1, S2=s2, S3=s3 or in short E=e. We assume P(e | h) are given (or inferred from data) and compute P(h | e). Natural order to specify a BN is usually according to causation or time line. Nothing in the definition dictates this choice, but normally more independence assumptions are explicated in the graph via this choice. Example: Atomic clock + watch clocks na ve Bayes model. 6

  7. The Visit-to-Asia Example Visit to Asia Smoking Tuberculosis Lung Cancer Abnormality in Chest Bronchitis Dyspnea X-Ray What are the relevant variables and their dependencies ? 7

  8. An abstract example ) , b P s l P v t = ( , , v , , , , P v s t l b a x d ( ) ( ) ( | ) ( | ) ( | ) ( | , ) ( | ) ( | , ) P P s P s P a t l P x a P d a b In the order V,S,T,L,B,A,X,D, we have a boundary basis: I( S, { }, V ) I( T, V, S) I( l, S, {T,V}) I( X,A, {V,S,T,L,B,D}) S V L T B A X D Does I( {X,D} ,A,V) also hold in P ? 8

  9. Independence in Bayesian networks n = i = i = pa ( , , ) ( | ) (*) p x x p x 1 n i i 1 n = ( , , ) ( | , ) p x x p x x x 1 1 1 n i i 1 I( Xi ,pai , {X1, ,Xi-1} \ pai ) This set of n independence assumptions is called Basis(D) . This equation uses specific topological order d of vertices, namely, that for every Xi, the parents of Xi appear before Xi in this order. However, all choices of a topological order are equivalent. Check for example topological orders of V,S,T,L of visit-to-Asia example. 9

  10. Directed and Undirected Chain Example X2 X3 X4 X1 P(x1,x2,x3,x4) = P(x1) P(x2|x1) P(x3|x2) P(x4|x3) Assumptions: I( X3 ,X2 , X1), I( X4 , X3,{X1, X2}) X2 X3 X4 X1 P(x1,x2,x3,x4) = P(x1, x2) P(x3|x2) P(x4|x3) Markov network: The joint distribution is a Multiplication of functions on three maximal cliques with normalizing factor 1. 10

  11. Reminder: Markov Networks (Undirected Graphical Models) Definition: A Markov Network is a pair (P,G) where P is a probability distribution over variables X1 , , Xn and G is a undirected graph over vertices X1 , , Xn with the relationship: ? P(x1 , , xn ) = K ?=1 gj(Cj ) for all values xi of Xi and where C1 Cm are the maximal cliques of G and gj are functions that assign a non-negative number to every value combination of the variables in Cj. Markov Networks and Bayesian networks represent the same set of independence assumptions only on Chordal graphs (such as chains and trees). 11

  12. From Separation in UGs To d-Separation in DAGs 12

  13. Paths Intuition: dependency must flow along paths in the graph A path is a sequence of neighboring variables S V Examples: X A D B A L S B L T B A X D 13

  14. Path blockage Every path is classified given the evidence: active -- creates a dependency between the end nodes blocked does not create a dependency between the end nodes Evidence means the assignment of a value to a subset of nodes. 14

  15. Path Blockage Blocked Blocked Active Three cases: Common cause S S B L B L 15

  16. Path Blockage Blocked Blocked Active Three cases: Common cause S S L L Intermediate cause A A 16

  17. Path Blockage Blocked Blocked Active Three cases: Common cause T L A Intermediate cause T L X A Common Effect T L X A X 17

  18. Definition of Path Blockage Definition: A path is active, given evidence Z, if Whenever we have the configuration T L A then either A or one of its descendents is in Z No other nodes in the path are in Z. Definition: A path is blocked, given evidence Z, if it is not active. Definition: X is d-separated from Y, given Z, if all paths from a node in X and a node in Y are blocked, given Z. Denoted by ID(X, Z, Y) . (X,Y,Z) are sets of variables that are disjoint. 18

  19. Example ID(T,S| ) = yes S V L T B A X D 19

  20. Example ID (T,S | ) = yes ID(T,S|D) = no S V L T B A X D 20

  21. Example ID (T,S | ) = yes ID(T,S|D) = no ID(T,S|{D,L,B}) = yes S V L T B A X D 21

  22. Example In the order V,S,T,L,B,A,X,D, we get from the boundary basis: ID( S, { }, V ) ID( T, V, S) ID( l, S, {T,V}) ID( X,A, {V,S,T,L,B,D}) S V L T B A X D 22

  23. Main results on d-separation The definition ofID(X, Z, Y) is such that: Soundness [Theorem 9]: ID(X, Z, Y)= yes implies IP(X, Z, Y) follows from Basis(D). Completeness [Theorem 10]: ID(X, Z, Y)= no implies IP(X, Z, Y) does not follow from Basis(D). 23

  24. Revisiting Example II S V L T B A X D So does IP( {X,D} ,A, V)hold ? Enough to check d-separation ! 24

  25. Local distributions- Asymmetric independence Tuberculosis (Yes/No) Lung Cancer (Yes/No) p(A|T,L) Abnormality in Chest (Yes/no) Table: p(A=y|L=n, T=n) = 0.02 p(A=y|L=n, T=y) = 0.60 p(A=y|L=y, T=n) = 0.99 p(A=y|L=y, T=y) = 0.99 Independence for some values IP(A, L=y, T) holds IP(A, L=n, T) does not hold So: IP(A, L, T) does not hold 25

  26. Claim 1: Each vertex Xi in a Bayesian Network is d- separated of all its non-descendants given its parents pai. Proof : Each vertex Xi is connected to its non-descendantsi via its parents or via its descendants. All paths via its parents are blocked because pai are given and all paths via descendants are blocked because they pass through converging edges Z were Z is not given. Hence by definition of d-separation the claim holds: ID(Xi, pai, non-descendantsi). 26

  27. Independence in Bayesian networks n = i = ( , , ) ( | , ) p x x p x x x 1 1 1 n i i 1 = n = i = pa ( , , ) ( | ) (*) p x x p x 1 n i i 1 Implies for every topological order d ? ? ?(? ?(1, ,? ?(? = ?(??(? |???(? ?=1 = ? ?(? ?(1, ,? ?(? = ?(??(? |?d(1 , ??(? 1 ?=1

  28. Claim 2: Each topological order d in a BN entails the same set of independence assumptions. Proof : By Claim 1: ID(Xi, pai, non-descendandsi) holds. For each topological order d on {1, ,n}, it follows IP(Xd(i), pad(i), non-descendsd(i)) holds as well. From soundness (Theorem 9) IP(Xd(i), pad(i), non-descendsd(i)) holds as well. By the decomposition property of conditional independence IP(Xd(i), pad(i), S) holds for every S that is a subset of non-descendsd(i) . Hence, Xi is independent given its parents also from S ={all variables before Xi in an arbitrary topological order d}. 28

  29. Extension of the Markov Chain Property For Markov chains we assume Basis(D): IP(Xi , Xi-1 , {X1 Xi-2}) Due to soundness of d-separation (Theorem 9) we also get: IP(Xi , {Xi-1 ,Xi+1} , {X1 Xi-2 , Xi+2 Xn }) Definition: A Markov blanket of a variable X wrt probability distribution P is a set of variables B(X) such that IP(X, B(X), all-other-variables) Chain Example: B(Xi) = {Xi-1 ,Xi+1}

  30. Markov Blankets in BNs Proof: Consequence of soundness of d-separation (Theorem 9). 30

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#