Graph Machine Learning Overview: Traditional ML to Graph Neural Networks

 
1
 
Online Social Networks and
Media
 
Graph ML I
 
2
 
Graph Machine Learning
 
Outline
 
Part I: Introduction, Traditional ML
Part II: Graph Embeddings
Part III: GNNs
Part IV (if time permits): Knowledge Graphs
 
Slides used based on:
 
CS224W:
 
Machine
 
Learning
 
with
 
Graphs
 
Jure
 
Leskovec,
 
Stanford
 
University
http://cs224w.stanford.edu
 
3
 
Part I:
Types of ML Tasks
Traditional ML
Feature Engineering
 
PyG
 
(PyTorch
 
Geometric)
:
The
 
ultimate
 
library
 
for
 
Graph
 
Neural
 
Networks
We
 
further
 
recommend:
GraphGym
:
 
Platform
 
for
 
designing
 
Graph
 
Neural
Networks.
Modularized
 
GNN
 
implementation,
 
simple
 
hyperparameter
tuning,
 
flexible
 
user
 
customization
Other
 
network
 
analytics
 
tools
:
 
SNAP.PY,
 
NetworkX
 
Tools
 
4
 
Types of ML tasks in graphs
 
5
 
E
d
g
e
 
(
l
i
n
k
)
 
l
e
v
e
l
 
C
o
m
m
u
n
i
t
y
(
s
u
b
g
r
a
p
h
)
l
e
v
e
l
 
G
r
a
p
h
-
l
e
v
e
l
p
r
e
d
i
c
t
i
o
n
,
G
r
a
p
h
g
e
n
e
r
a
t
i
o
n
 
N
o
d
e
l
e
v
e
l
 
Types of ML tasks in graphs
 
6
 
D
 
G
 
C
 
H
 
E
 
F
 
Node
 
features
B
 
Graph
 
features
 
Link
 
features
A
 
 
𝐷
 
Design
 
features
 
for
 
nodes/links/graphs
Obtain
 
features
 
for
 
all
 
training
 
data
 
𝐷
 
 
𝐷
 
Traditional ML Pipeline
 
7
 
Train
 
an ML
 
model:
Random
 
forest
SVM
Neural
 
network,
 
etc.
 
𝒙
𝟏
 
𝑦
1
 
𝒙
𝑵
 
𝑦
𝑁
 
Apply
 
the
 
model:
Given
 
a
 
new
 
node/link/graph,
 
obtain
 
its
 
features
 
and
 
make
 
a
 
prediction
 
𝒙
 
𝑦
 
Traditional ML Pipeline
 
8
 
?
 
?
 
?
 
?
 
?
 
Machine
Learning
 
Node
 classification
 
Node Level Tasks (example)
 
9
 
Using
 
effective
 
features
 
over
 
graphs
 
is
 
the
 
key 
to
achieving
 
good
 
model
 
performance.
Traditional
 
ML pipeline
 
uses
 
hand
- 
designed
features.
W
e
 
will 
overview
 
traditional
 
features
 
for:
Node-level
 
prediction
Link-
level
 
prediction
Graph-
level
 
prediction
For
 
simplicity,
 
we
 
focus
 
on
 
undirected
 
graphs
.
 
Feature Design
 
10
 
Goal:
 
Make
 
predictions
 
for
 
a
set
 
of
 
objects
 
Design
 
choices:
Features:
 
d
-
dimensional
 
vectors
Objects:
 
Nodes,
 
edges,
 
sets of
 
nodes,
entire
 
graphs
Objective
 
function:
What
 
task
 
are
 
we
 
aiming
 
to
 
solve?
 
11
 
NODE LEVEL FEATURES AND TASKS
 
12
 
Goal:
 
Characterize
 
the
 
structure
 
and
 
position
 
of 
a
 
node
in the
 
network:
 
Node
 
degree
Node
 
centrality
 
C
 
Clustering
 
coefficient
Graphlets
A
 
D
 
E
 
H
 
F
 
G
 
Node
 
feature
B
 
Node Level Features
 
13
 
The
 
degree
 
𝑘
𝑣
 
of
 
node
 
𝑣
 
is
 
the
 
number
 
of
edges
 
(neighboring
 
nodes)
 
the
 
node
 
has.
Treats
 
all
 
neighboring
 
nodes
 
equally.
𝑘
𝐵
 
=
 
2
 
C
 
B
 
D
 
E
 
H
 
F
 
G
 
14
 
𝑘
𝐴
 
=
 
1
A
 
𝑘
𝐶
 
=
 
3
 
𝑘
𝐷
 
=
 
4
 
Node degree
 
Engienvector
 
centrality
 
Node
 
degree
 
counts
 
the
 
neighboring
 
nodes
without
 
capturing
 
their
 
importance.
Node
 
centrality
 
𝑐
𝑣
 
takes
 
the
 
node
 
importance
in
 
a
 
graph
 
into
 
account
Different
 
ways
 
to
 
model
 
importance:
Eigenvector
 
(Pagerank) 
centrality
Betweenness
 
centrality
Closeness
 
centrality
and
 
many
 
others…
 
Node centrality
 
15
 
A
 
node
 
𝑣
 
is
 
important
 
if
 
surrounded
 
by
 
important
 
neighboring
 
nodes
 
𝑢
 
 
𝑁(𝑣)
.
We
 
model the
 
centrality
 
of
 
node
 
𝑣
 
as
 
the
 
sum
 
of
 
the
 
centrality
 
of
 
neighboring
 
nodes
:
 
Pagerank centrality
 
16
 
A
 
node
 
is
 
important
 
if
 
it
 
lies
 
on
 
many
 
shortest 
 
paths
between
 
other
 
nodes.
 
Example:
 
C
 
A
 
B
 
D
 
E
 
𝑐
𝐴
 
=
 
𝑐
𝐵
 
=
 
𝑐
𝐸
 
=
 
0
𝑐
𝐶
  
=
 
3
(A-
C
-
B,
 
A-
C
-D,
 
A-
C
-D-
E)
 
𝑐
𝐷
 
=
 
3
(A-C-
D
-E,
 
B-
D
-E,
 
C-
D
-
E)
 
Betweness centrality
 
17
 
A
 
node
 
is
 
important
 
if
 
it
 
has
 
small
 
shortest
 
path
 
lengths
 
to
 
all
 
other
 
nodes.
 
Example:
 
C
 
A
 
B
 
D
 
E
 
𝑐
𝐴
 
=
 
1/(2
 
+
 
1
 
+
 
2
 
+
 
3)
 
=
 
1/8
(A-C-
B,
 
A-
C,
 
A-
C-D,
 
A-
C-D-
E)
 
𝑐
 
=
 
1/(2
 
+
 
1
 
+
 
1
 
+
 
1)
 
=
 
1/
5
 
𝐷
(D-C-A,
 
D-B,
 
D-C,
 
D-
E)
 
Closeness centrality
 
18
 
Examples:
 
𝑣
 
𝑣
 
𝑣
 
𝑒
𝑣
 
=
 
1
 
𝑒
𝑣
 
=
 
0.5
 
𝑒
𝑣
 
=
 
0
 
#(node
 
pairs
 
among
 
𝑘
𝑣
 
neighboring
 
nodes)
In
 
our
 
examples
 
below
 
the
 
denominator
 
is
 
6
 
(4
 
choose
 
2).
 
Clustering coefficient
 
19
 
Observation:
 
Clustering
 
coefficient
 
counts
 
the
#(triangles)
 
in
 
the
 
ego-network
 
We
 
can
 
generalize
 
the
 
above
 
by
 
counting
#(pre-
specified
 
subgraphs
)
,
 
i.e.,
 
graphlets
.
 
𝑣
 
𝑣
 
3
 
triangles
 
(out
 
of
 
6
 
node
 
triplets)
 
𝑒
𝑣
 
=
 
0.5
 
𝑣
 
𝑣
 
Graphlets
 
20
 
Goal:
 
Describe
 
network
 
structure
 
around
node
 
𝑢
Graphlets
 
are
 
small
 
subgraphs
 
that
 
describe
the
 
structure
 
of
 
node
 
𝑢
’s
 
network
neighborhood
 
Analogy:
 
Degree
 
counts
 
#(edges)
 
that
 
a
 
node
 
touches
Clustering
 
coefficient
 
counts
 
#(triangles)
 
that
 
a 
node
 
touches.
Graphlet
 
Degree
 
Vector
 
(GDV)
:
 
Graphlet-
base 
features
 
for
nodes
GDV
 
counts
 
#(graphlets)
 
that
 
a
 
node
 
touches
 
C
 
B
 
𝑢
 
E
 
Graphlets
 
A
 
21
 
Def:
 
Induced
 
subgraph
 
 
is
 
another
 
graph,
 
formed
from
 
a
 
subset
 
of
 
vertices
 
and
 
all
 
the
 
edges
connecting
 
the
 
vertices
 
in
 
that
 
subset.
 
C
 
A
 
B
 
𝑢
 
E
 
C
 
B
 
𝑢
 
Induced
subgraph:
 
C
 
B
 
𝑢
 
Not
 
induced
subgraph:
 
Graphlets
 
22
 
Def:
 
Graph
 
Isomorphism
Two
 
graphs
 
which
 
contain
 
the
 
same
 
number
 
of
 
nodes
 
connected
 
in
 
the
 
same
 
way
 
are
 
said
 
to
 
be
 
isomorphic.
 (one-to-one mapping of their nodes)
 
Isomorphic
Node
 
mapping:
 
(e2,c2),
 
(e1,
 
c5),
(e3,c4),
 
(e5,c3),
 
(e4,c1)
 
Non-
Isomorphic
The
 
right
 
graph
 
has
 
cycles
 
of
 
length
 
3
 
but
 
t
he
 left
graph
 
does
 
not,
 
so
 
the
 
graphs
 
cannot
 
be
 
isomorphic.
 
Source:
 
Mathoverflow
 
Graphlets
 
23
 
Przulj
 
et
 
al.
,
 
Bioinformatics
 
2004
 
Graphlets
:
 
Rooted
 
connected 
induced
 
non-
isomorphic
 
subgraphs:
 
Graphlets
 
Graphlet Degree Vector (GDV): 
A
 
count
vector
 
of
 
graphlets
 
rooted
 
at
 
a
 
given
 
node.
 
24
 
28
 
Example:
 
All
 
possible
 
graphlets
 
on
 
up
 
to
 
3
 nodes
 
Graphlet
 
instances
 
of
 
node
 
u:
 
Graphlets
 
of
 
node
 
𝑢
:
𝑎,
 
𝑏,
 
𝑐,
 
𝑑
[2,1,0,2]
 
Graphlets
 
25
 
Przulj
 
et
 
al.
,
 
Bioinformatics
 
2004
 
There
 
are
 
73
 
different
 
graphlets
 
on
 
up
 
to
 
5
 
nodes
 
Graphlet
id
 
(Root/
“position”
of
 
node
 
𝑢
)
 
Graphlets
 
26
 
C
 
Considering
 
graphlets
 
of
 
size
 
2-5
 
nodes
 
we
 
get:
Vector
 
of
 
73
 
coordinates
 
is
 
a
 
signature
 
of
 
a
 
node 
that
describes
 
the
 
topology
 
of
 
node's
 
neighborhood
 
Graphlet
 
degree
 
vector
 
provides
 
a
 
measure
 
of 
a
 
node’s
local
 
network
 
topology
:
Comparing
 
vectors
 
of
 
two
 
nodes
 
provides
 
a 
mor
e 
detailed
measure
 
of
 
local
 
topological
 
similarity
 
than 
node
 
degrees
 
or
clustering
 
coefficient
 
Graphlets
 
C
 
A
 
B
 
𝑢
 
E
 
𝑢
 
has 
graphlets:
 
0,
 
1,
 
2,
 
3, 
5,
 
10,
 
11,
 
 
27
 
We
 
have
 
introduced
 
different
 
ways
 
to
 
obtain
node
 
features.
They
 
can
 
be
 
categorized
 
as:
Importance-based
 
features:
Node
 
degree
Different
 
node
 
centrality
 
measures
Structure-
based
 
features:
Node
 
degree
Clustering
 
coefficient
Graphlet
 
count
 
vector
 
Node Level Features
 
28
 
Importance-based
 
features:
 
capture
 
the
importance
 
of
 
a
 
node
 
in
 
a 
graph
Node
 
degree:
Simply
 
counts
 
the
 
number
 
of
 
neighboring 
nodes
Node
 
centrality:
Models
 
importance
 
of
 
neighboring
 
nodes 
in
 
a
 
graph
Different
 
modeling
 
choices:
 
eigenvector
 
centrality,
 
betweenness
 
centrality,
 
closeness
 
centrality
Useful
 
for
 
predicting
 
influential
 
nodes
 
in
 
a
 
graph
Example:
 
predicting
 
celebrity
 
users
 
in
 
a
 
social
 
network
 
Node Level Features
 
29
 
Structure-
based
 
features:
 
Capture
 
topological
properties
 
of
 
local
 
neighborhood
 
around
 
a
 
node.
Node
 
degree:
Counts
 
the
 
number
 
of
 
neighboring
 
nodes
Clustering
 
coefficient:
Measures
 
how connected
 
neighboring
 
nodes
 
are
Graphlet
 
degree
 
vector:
Counts
 
the
 
occurrences
 
of
 
different
 
graphlets
Useful
 
for
 
predicting
 
a
 
particular
 
role
 
a
 
node
plays
 
in
 
a
 
graph:
Example:
 
Predicting
 
protein
 
functionality
 
in
 
a
 
protein-
protein
 
interaction
 
network.
 
Node Level Features
 
30
 
?
 
?
 
?
 
?
 
?
 
Machine
Learning
 
Node
 classification
 
Node Level Tasks
 
31
 
Computationally
 
predict
 
the 
3D
 
structure
 
of a protein
based
 
solely
 
on
 
its
 
amino acid
 
sequence:
For
 
each
 
node
 
predict
 
its
 
3D
 
coordinates
 
Image
 
credit:
 
DeepMind
 
Protein Folding
 
32
 
Image
 
credit:
 
DeepMind
 
Image
 
credit:
 
SingularityHub
 
33
 
Image
 
credit:
 
DeepMind
 
34
 
Key
 
idea:
 
“Spatial
 
graph”
Nodes:
 
Amino
 
acids
 
in
 
a
 
protein
 
sequence
Edges:
 
Proximity
 
between
 
amino
 
acids
 
(residues)
 
S
p
a
t
i
a
l
 
g
r
a
p
h
 
LINK PREDICTION
 
35
 
The
 
task
 
is
 
to
 
predict
 
new
 
links
 
based
 
on
 the
existing
 
links.
Two ways: (a) define a score for each pair of
nodes, rank pairs, return top K ones, (b) build a
classifier with input pair of nodes, output
probability of existence
 
C
 
A
 
B
 
D
 
E
 
H
 
F
 
G
 
?
 
?
 
Link Prediction
 
36
 
The
 
key
 
is
 
to
 
design
 
features
 
for
 
a
 
pair
 
of
 
nodes
.
 
(for computing the score, as input to the classifier
First, score
 
C
 
A
 
B
 
D
 
E
 
H
 
F
 
G
 
?
 
?
 
Link Prediction
 
37
 
(1) 
Links
 
missing
 
at
 
random:
Missing/unknown, incomplete information
Remove
 
a
 
random
 
set
 
of
 
links
 
and
 
then 
 
aim
 
to
predict
 
them
 
38
 
Link Prediction
 
0
 
Given
 
𝐺[𝑡
0
,
 
𝑡
 
]
 
a
 
graph
 
defined
 
by
 
edges
 
0
 
up
 
to
 
time
 
𝑡
 
,
 
output
 
a
 
ranked
 
list
 
L
 
0
 
of
 
edges
 
(not
 
in
 
𝐺[𝑡
0
,
 
𝑡
 
]
)
 
that
 
are
 
 
predicted
 
to
 
appear
 
in
 
time
 
𝐺[𝑡
1
,
 
𝑡
1
]
 
1
 
0
 
0
 
𝐺[𝑡
  
,
 
𝑡
 
]
 
1
 
𝐺[𝑡
1
,
 
𝑡
 
]
 
Link Prediction
 
(2) Temporal Links Prediction
 
39
 
Methodology:
For
 
each
 
pair
 
of
 
nodes
 
(x,y)
 
compute
 
score
 
c(x,y)
For
 
example,
 
c(x,y)
 
could
 
be
 
the
 
#
 
of
 
common
 
neighbors
 
of
 
x
 
and
 
y
Sort 
pairs
 
(x,y)
 
by
 
the
 
decreasing
 
score
 
c(x,y)
Predict
 
top
 
n
 
pairs
 
as
 
new
 
links
 
X
 
Score-based Link Prediction
 
40
 
n
 
=
 
|E
new
|
:
 
#
 
new
 
edges
 
that
 
appear
 
during
 
the test
 
period
 
[𝑡
1
,
 
𝑡
]
 
Take
 
top
 
n
 
elements
 
of
 
L
 
and
 
count
 
correct
 
edges
 
Evaluation:
 
Distance-
based
 
feature
Local
 
neighborhood
 
overlap
Global
 
neighborhood
 
overlap
 
C
 
B
 
D
 
E
 
H
 
F
 
G
 
Link
 
feature
A
 
Link Level Features
 
41
 
𝑆
𝐵𝐻
 
=
 
𝑆
𝐵𝐸
 
=
 
𝑆
𝐴𝐵
 
=
 
2
 
C
 
A
 
Example:
B
 
D
 
E
 
H
However,
 
this
 
does
 
not
 
capture
 
the
 
degree
 
of
 
neighborhood
overlap:
 
Node pair
 
(B,
 
H)
 
has
 
2
 
shared
 
neighboring
 
nodes, 
while
 
pairs
 
(B,
 
E)
 
and
(A,
 
B)
 
only
 
have
 
1
 
such
 
node.
 
F
 
G
 
𝑆
𝐵𝐺
 
=
 
𝑆
𝐵𝐹
 
=
 
3
 
Distance-based Features
 
Shortest-
path
 
distance
 
between
 
two
 
nodes
 
42
 
H
 
Local Neighborhood Features
: Captures # neighboring
nodes shared between two nodes
 
Common
 
neighbors:
 
C
 
A
 
B
 
D
 
E
 
F
 
𝑁
 
𝐴
 
𝑁
 
𝐵
 
Local Neighborhood Overlap Features
 
Jaccard coefficient:
 
Example:
 
43
 
C
 
A
 
B
 
D
 
E
 
F
 
𝑁
 
𝐴
 
𝑁
 
𝐵
 
Adamic-Adar index
:
 
Local Neighborhood Overlap Features
 
44
 
However,
 
the
 
two
 
nodes
 
may
 
still
 
potentially
 
connect
 
in
 
the
future.
Global
 
neighborhood
 
overlap
 
metrics 
resolve 
the
 
limitation
by
 
considering
 
the
 
entire
 
graph.
 
C
 
A
 
Limitation
 
of
 
local
 
neighborhood
 
features:
Metric
 
is
 
always
 
zero
 
if the
 
two
 
nodes
 
do
 
not
 
have 
any
neighbors
 
in
 
common.
B
 
D
 
E
 
F
 
𝑁
𝐴
 
𝑁
𝐸
 
𝑁
𝐴
 
 
𝑁
𝐸
 
=
 
𝜙
|
𝑁
𝐴
 
 
𝑁
𝐸
 
|
 
=
 
0
 
Global Neighborhood Overlap Features
 
45
 
Katz
 
index:
 
count
s
 
the
 
number
 
of
 
walks
 
of
 
all
lengths
 
between
 
a
 
given
 
pair
 
of
 
nodes.
 
 
How
 
to
 
compute
 
#walks
 
between
 
two 
nodes?
 
Use
 
powers
 
of
 
the
 
adjacency
 
matrix!
 
Global Neighborhood Overlap Features
 
46
 
Computing
 
#walks
 
between
 
two
 
nodes
Recall
:
 
𝑨
𝑢𝑣
 
=
 
1
 
if
 
𝑢
 
 
𝑁(𝑣)
 
Let
 
𝑷
(𝑲)
 
=
 
#walks
 
of
 
length
 
𝑲
 
between
 
𝒖
 
and
 
𝒗
 
𝒖𝒗
We
 
will
 
show
 
𝑷
(𝑲)
 
=
 
𝑨
𝒌
 
𝒖𝒗
 
𝑷
(𝟏)
 
=
 
#walks
 
of
 
length
 
1
 
(direct
 
neighborhood)
 
between
 
𝑢
 
and
 
𝑣
 
=
 
𝑨
𝒖𝒗
 
4
 
3
2
1
 
𝟏𝟐
 
47
 
𝑷
(𝟏)
 
=
 
𝑨
 
𝟏𝟐
 
Global Neighborhood Overlap Features
 
How
 
to
 
compute
 
?
Step
 
1:
 
Compute
 
#walks
 
of
 
length
 
1
 
between
each 
of
 
𝒖
’s
 
neighbor
 
and
 
𝒗
Step
 
2
:
 
Sum
 
up
 
these
 
#walks
 
across
 
u’s
 
neighbors
 
𝒊𝒗
 
𝒖𝒗
 
𝒊
 
𝒖𝒊
 
𝒊
 
𝒖𝒊
 
𝒊𝒗
 
𝒖𝒗
 
𝑷
(
𝟐
)
 
=
 
Σ
 
𝑨
 
 
𝑷
(𝟏)
  
=
 
Σ
 
𝑨
 
 
𝑨
 
=
 
𝑨
𝟐
 
N
o
d
e
 
1
s
 
n
e
i
g
h
b
o
r
s
 
#
w
a
l
k
s
 
o
f
 
l
e
n
g
t
h
 
1
 
b
e
t
w
e
e
n
N
o
d
e
 
1
s
 
n
e
i
g
h
b
o
r
s
 
a
n
d
 
N
o
d
e
 
2
 
𝟏𝟐
 
12
 
𝑷
(𝟐)
 
=
 
𝑨
2
 
P
o
w
e
r
 
o
f
a
d
j
a
c
e
n
c
y
 
Global Neighborhood Overlap Features
 
48
 
How
 
to
 
compute
 
#walks
 
between
 
two
 
nodes?
 
Use
 
adjacency
 
matrix
 
powers
 
𝑨
𝑢𝑣
 
specifies
 
#walks
 
of
 
length 1
 
(direct
 
neighborhood)
 
between
 
𝑢
 
and
 
𝑣
.
 
Global Neighborhood Overlap Features
 
49
 
Katz
 
index
 
between
 
𝑣
1
 
and
 
𝑣
2
 
is
 
calculated
 
as
Sum
 
over
 
all
 
walk
 
lengths
#walks
 
of
 
length
 
𝑙
between
 
𝑣
1
 
and
 
𝑣
2
0
 
<
 
𝛽
 
<
 
1
:
 
discount
 
factor
Katz
 
index
 
matrix
 
is
 
computed
 
in
 
closed-
form:
 
𝑖=0
 
50
50
 
=
 
σ
 
𝛽
𝑖
 
𝑨
𝑖
 
by
 
geometric
 
series
 
of
 
matrices
 
Global Neighborhood Overlap Features
 
Distance-based
 
features:
Uses
 
the
 
shortest
 
path
 
length
 
between
 
two
 
nodes
 
but
 
does
 
not
 
capture
 
how
 
neighborhood
 
overlaps.
Local
 
neighborhood
 
overlap:
Captures
 
how
 
many
 
neighboring
 
nodes
 
are
 
shared
 
by
 
two
 
nodes.
Becomes
 
zero
 
when
 
no
 
neighbor
 
nodes
 
are
 
shared.
Global
 
neighborhood
 
overlap:
Uses
 
global
 
graph
 
structure
 
to score
 
two
 
nodes.
Katz
 
index
 
counts
 
#walks
 
of
 
all
 
lengths
 
between
 
two
 
nodes.
 
Link Level Features
 
51
 
Classification for Link Prediction
 
52
 
52
 
Items
 
Users
 
Users
 
interacts
 
with
 
items
Watch
 
movies,
 
buy
 
merchandise,
 
listen
 
to
 
music
Nodes:
 
Users
 
and
 
items
Edges:
 
User-
item
 
interactions
Goal:
 
Recommend
 
items
 
users
 
might
 
like
 
Interactions
 
“You
 
might
 
also
 
like”
 
Example: Recommender Systems
 
53
 
Many
 
patients
 
take
 
multiple
 
drugs
 
to
 
treat
complex
 
or
 
co-
existing
 
diseases
:
 
46%
 
of
 
people
 
ages
 
70-79
 
take
 
more
 
than
 
5
 drugs
Many
 
patients
 
take
 
more
 
than
 
20
 
drugs
 
to
 
treat
heart
 
disease,
 
depression,
 
insomnia,
 etc.
Task:
 
Given
 
a
 
pair
 
of
 
drugs
 
predict
adverse
 
side
 
effects
 
,
 
30%
prob.
 
65%
prob.
 
Example: Drug Side Effects
 
54
 
Nodes
:
 
Drugs
 
&
 
Proteins
Edges
:
 
Interactions
 
Query: 
How 
likely 
will
Simvastatin
 
and
Ciprofloxacin,
 
when 
taken
together, 
break
 
down
muscle tissue?
 
55
 
Zitnik
 
et
 
al.,
 
Modeling
 Polypharmacy
 
Side
 
Effects
 with
 
Graph
 
Convolutional
 
Networks
,
 
Bioinformatics 
2018
 
Example: Drug Side Effects
 
GRAPH LEVEL FEATURES AND TASKS
 
56
 
Goal:
 
We
 
want
 
features
 
that
 
characterize
 
the
structure
 
of
 
an
 
entire
 
graph.
 
For
 
example
:
 
C
 
A
 
B
 
D
 
E
 
H
 
F
 
G
 
57
 
Graph Level Features
 
Graph Kernels
 
58
 
Graph
 
Kernels
:
 
Measure
 
similarity
 
between 
two
graphs:
Graphlet
 
Kernel
 
[1]
Weisfeiler-
Lehman
 
Kernel
 
[2]
Other
 
kernels
 
are
 
also
 
proposed
 
in
 
the
 
literature
(beyond
 
the
 
scope
 
of
 
this
 
lecture)
Random-walk
 
kernel
Shortest-
path
 
graph
 
kernel
And
 
many
 
more…
 
1
Shervashidze,
 
Nino,
 
et
 
al.
 
"Efficient
 
graphlet
 
kernels
 
for
 
large
 
graph
 
comparison."
 
Artificial
 
Intelligence
 
and
 
Statistics.
 
2009.
2
Shervashidze,
 
Nino,
 
et
 
al.
 
"Weisfeiler-lehman
 
graph
 
kernels."
 
Journal
 
of
 
Machine
 
Learning
 
Research
 
12.9
 
(2011).
 
Graph Kernels
 
59
 
Goal
:
 
Design
 
graph
 
feature
 
vector
 
𝜙
(G)
Key
 
idea
:
 
Bag-of-
Words
 
(BoW)
 
for
 
a
 
graph
BoW
 
simply
 
uses
 
the
 
word
 
counts
 
as 
features
 
for
documents
 
(no
 
ordering
 
considered).
Naïve
 
extension
 
to
 
a
 
graph:
 
Regard
 
nodes
 
as
 
words.
Since
 
both
 
graphs
 
have
 
4
 
red nodes
,
 
we
 
get
 
the
same
 
feature
 
vector
 
for
 
two
 
different
 
graph
s
 
𝜙(
 
)
 
=
 
𝜙
 
60
60
 
Graph Kernels
 
Deg1:
 
Deg2:
 
Deg3:
 
Both
 
Graphlet
 
Kernel
 
and
 
Weisfeiler-
Lehman
(WL)
 
Kernel
 
use
 
Bag-
of-*
 
representation
 
of
graph,
 
where
 
*
 
is
 
more
 
sophisticated
 
than
node
 
degrees!
 
𝜙(
 
)
 
=
 
count(
 
𝜙(
 
)
 
=
 
count(
 
)
 
=
 
[1,
 
2,
 
1]
Obtains
 
different
 
features
for
 
different
 
graphs!
)
 
=
 
[0,
 
2,
 
2]
 
61
 
Graph Kernels
 
What
 
if
 
we
 
use
 
Bag
 
of
 
node
 
degrees
?
 
Key
 
idea:
 
Count
 
the
 
number
 
of
 
different
graphlets
 
in
 
a
 
graph.
 
Note:
 
Definition
 
of
 
graphlets
 
here
 
is
 
slightly
 
different
 
from
 
node-
level
 
features.
The
 
two
 
differences
 
are:
Nodes
 
in
 
graphlets here
 
do
 
not
 
need
 
to
 
be
 
connected
(allows
 
for
 
isolated
 
nodes)
The
 
graphlets
 
here
 
are
 
not
 
rooted
.
 
Graphlet Features
 
62
 
Let
 
𝓖
𝒌
 
=
 
(𝒈
𝟏,
 
𝒈
𝟐
,
 
 
,
 
𝒈
𝒏
𝒌
)
 
be
 
a
 
list
 
of
graphlets 
of
 
size
 
𝒌
.
 
For
 
𝑘
 
=
 
3
 
,
 
there
 
are
 
4
 
graphlets.
 
For
 
𝑘
 
=
 
4
 
,
 
there
 
are
 
11
 
graphlets.
 
𝑔
1
 
𝑔
2
 
𝑔
3
 
𝑔
4
 
Shervashidze
 
et al.
,
 
AISTATS
 
2011
 
63
63
 
Graphlet Features
 
Given
 
graph
 
𝐺
,
 
and
 
a
 
graphlet
 
list
 
𝒢
𝑘
= 
(𝑔
1,
𝑔
2
,
 
 
,
 
𝑔
𝑛
𝑘
)
,
 
define
 
the
 
graphlet
 
count
vector
 
𝒇
𝐺
 
 
𝑛
𝑘
 
as
 
(𝒇
𝐺
)
𝑖
=
 
#(
𝑔
𝑖
 
 
𝐺
)
 
for
 
𝑖
 
=
 
1,2,
 
 
,
 
𝑛
𝑘
.
 
64
64
 
Graphlet Features
 
Example
 
for
 
𝑘
 
=
 
3.
 
𝑔
1
 
𝑔
2
 
𝑔
3
 
𝑔
4
 
𝐺
 
𝒇
𝐺
 
=
 
(1,
 
3,
 
6,
 
0)
T
 
65
65
 
Graphlet Features
 
Graphlet Kernel
 
66
 
Counting
 
size-
𝑘
 
graphlets
 
for
 
a
 
graph
 
with
 
size
 
𝑛
 
by
enumeration
 
takes
 
𝑛
𝑘
.
This
 
is
 
unavoidable
 
in
 
the
 
worst-
case
 
since
 
subgraph
isomorphism
 
test
 
(judging
 
whether
 
a 
graph
 
is
 
a
 
subgraph
of
 
another
 
graph)
 
is
 
NP-
hard
.
If
 
the
 
node
 
degree
 of a graph
 
is
 
bounded
 
by
 
𝑑
,
 
an
𝑂(𝑛𝑑
𝑘−1
)
 
algorithm
 
exists
 
to
 
count
 
all
 
the 
graphlets
 
of
size
 
𝑘
.
Can
 
we
 
design
 
a
 
more
 
efficient
 
graph
 
kernel?
 
Limitation
:
 
Counting
 
graphlets
 
is
 
expensive
 
Graphlet Kernel
 
67
 
Goal
:
 
Design
 
an
 
efficient
 
graph
 
feature
descriptor
 
𝜙
(G)
Idea
:
 
Use
 
neighborhood
 
structure
 
to
iteratively
 
enrich
 
node
 
vocabulary.
Generalized
 
version
 
of
 
Bag
 
of
 
node
 
degrees
 
since
 
node
 
degrees
 
are
 
one-hop
 
neighborhood
 
information.
Algorithm
 
to
 
achieve
 
this:
Color
 
refinement
 
Weisfeiler-
Lehman
 Kernel
 
68
 
Color Refinement
 
69
 
Assign
 
initial
 
colors
 
1
 
1
 
1
 
1
 
1
Aggregate
 
neighboring
 
colors
 
1
 
1
 
1
 
1
 
1
 
1
 
1
 
1,111
 
1,11
 
1,1111
 
1,1
 
1,1
 
1,111
 
1,11
 
1,111
 
1,1111
 
1,1
 
1,1
 
1,111
 
𝐺
1
 
𝐺
2
 
Color Refinement
 
70
 
Aggregated
 
colors
 
Hash
 
aggregated
 
colors
 
4
 
3
 
5
 
2
 
2
 
4
 
3
 
4
 
5
 
2
 
2
 
4
 
Hash
 
table
 
1,111
 
1,11
 
1,1111
 
1,1
 
1,1
 
1,111
 
1,11
 
1,111
 
1,1111
 
1,1
 
1,1
 
1,111
 
Color Refinement
 
71
 
Aggregated
 
colors
 
4
,
3
4
5
 
3
,
44
 
5
,
22
44
 
2
,
5
 
2
,
5
 
4
,
3
4
5
 
3
,
4
5
 
4
,
3
4
5
 
5
,
2
3
44
 
2
,
5
 
2
,
4
 
4
,
2
4
5
 
4
 
3
 
5
 
2
 
2
Hash
 
aggregated
 
colors
 
4
 
3
 
4
 
5
 
2
 
2
 
4
 
Color Refinement
 
72
 
Aggregated
 
colors
 
Hash
 
aggregated
 
colors
 
11
 
8
 
12
 
7
 
7
 
11
 
9
 
11
 
13
 
7
 
6
 
10
 
4
,
3
4
5
 
3
,
44
 
5
,
22
44
 
2
,
5
 
2
,
5
 
4
,
3
4
5
 
3
,
4
5
 
4
,
3
4
5
 
5
,
2
3
44
 
2
,
5
 
2
,
4
 
Hash 
table
2
,
4
 
--
>
 
6
 
4
,
2
4
5
 
Color Refinement
 
73
 
  
Colors 
1,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
10
,
11
,
12
,
13
   
=
 
[6,2,1,2,1,0,2,1,0,
 
0,0,2,1]
 
 
 
Colors
 
1,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
,
10
,
11
,
12
,
13
=
[6,2,1,2,1,1,1,0,1,
 
1,1,0,1]
 
After
 
color
 
refinement,
 
WL
 
kernel
 
counts
 
number 
of
nodes
 
with
 
a
 
given
 
color.
 
Weisfeiler-
Lehman
 Kernel
 
74
 
The
 
WL
 
kernel
 
value
 
is
 
computed
 
by
 
the
 
inner
product
 
of
 
the
 
color
 
count
 
vectors:
 
K(
 
,
 
)
 
=
=
 
49
 
Weisfeiler-
Lehman
 Kernel
 
75
 
WL 
kernel
 
is
 
computationally
 
efficient
The
 
time
 
complexity
 
for color 
refinement
 
at
 
each
 
step
 
is
linear
 
in
 
#(edges),
 
since
 
it
 
involves
 
aggregating
 
neighboring
colors.
When
 
computing
 
a
 
kernel
 
value,
 
only
 
colors
appeared
 
in
 
the
 
two
 
graphs
 
need
 
to
 
be
 
tracked.
Thus,
 
#(colors)
 
is
 
at
 
most
 
the
 
total
 
number
 
of
 
nodes.
Counting
 
colors
 
takes
 
linear-time
 
w.r.t.
 
#(nodes).
In
 
total,
 
time
 
complexity
 
is
 
linear
 
in
 
#(edges)
.
 
Weisfeiler-
Lehman
 Kernel
 
76
 
Graphlet
 
Kernel
Graph
 
is
 
represented
 
as
 
Bag-of-
graphlets
Computationally
 
expensive
Weisfeiler-
Lehman
 
Kernel
Apply
 
𝐾
-step
 
color
 
refinement
 
algorithm
 
to
 
enrich
 
node
 
colors
Different
 
colors
 
capture
 
different
 
𝐾
-hop
 
neighborhood
 
structures
Graph
 
is
 
represented
 
as
 
Bag-of-
colors
Computationally
 
efficient
Closely
 
related
 
to Graph
 
Neural
 
Networks
 
(as 
we
 
will
 
see!)
 
Graph Kernels
 
77
 
a
 
Example 1: Traffic Prediction
 
78
 
Nodes:
 
Road
 
segments
Edges:
 
Connectivity
 
between
 
road
 
segments
Prediction:
 
Time
 
of
 
Arrival
 
(ETA)
 
Image
 
credit:
 
DeepMind
 
79
 
Road networks as graphs
 
Traffic Prediction
 
Predicting
 
Time
 
of
 
Arrival
 
with
 
GNNS
 
Used in Google
 
Maps
 
Image
 
credit:
 
DeepMind
 
Traffic Prediction with GNNs
 
80
 
Antibiotics
 
are
 
small
 
molecular
 
graphs
Nodes:
 
Atoms
Edges:
 
Chemical
 
bonds
 
Konaklieva,
 
Monika I.
 
"Molecular
 
targets
 
of 
β-lactam-
based 
antimicrobials:
beyond
 
the usual
 
suspects."
 
Antibiotics 3.2
 
(2014): 
128-
142.
 
Image
 
credit:
 
CNN
 
Example 2: Drug Prediction
 
81
 
Stokes,
 
Jonathan
 
M.,
 
et
 
al.
 
"A
 
deep
 
learning
 
approach
 
to
 
antibiotic
 
discovery."
Cell
 
180.4
 
(2020):
 688-
702.
 
A
 
Graph
 
Neural
 
Network
 
graph
 
classification
 
model
Predict
 
promising
 
molecules
 
from
 
a
 
pool
 
of
 
candidates
 
Stokes
 
et
 al.,
 
A
 
Deep
 
Learning
 
Approach
 
to
 
Antibiotic
 
Discovery
,
 
Cell
 
2020
 
Drug Prediction
 
82
 
Physical
 
simulation
 
as
 
a
 
graph:
 
Nodes
:
 
Particles
Edges
:
 
Interaction
 
between
 
particles
 
Sanchez-
Gonzalez
 
et
 
al.,
 
Learning
 
to
 
simulate
 
complex
 
physics
 
with
 
graph
 
networks
,
 
ICML
 
2020
 
Example 3: Physical Simulation
 
83
 
A
 
graph
 
evolution
 
task:
 
Goal
:
 
Predict
 
how
 
a
 
graph
 
will
 
evolve
 
over
 
time
 
Sanchez-
Gonzalez
 
et
 
al.,
 
Learning
 
to
 
simulate
 
complex
 
physics
 
with
 
graph
 
networks
,
 
ICML
 
2020
 
Physical Simulation
 
84
 
https://medium.com/syncedreview/deepmind-googles-ml-based-graphcast-outperforms-the-
world-
s-best-medium-range-weather-
9d114460aa0c
 
Application: Weather Forecasting
 
85
 
Traditional
 
ML
 
Pipeline
Hand-
crafted
 
feature
 
+ ML
 model
Hand-crafted
 
features
 
for
 
graph
 
data
Node-
level:
Node
 
degree,
 centrality,
 
clustering
 
coefficient,
 
graphlets
Link-
level:
Distance-based
 
feature
local/global
 
neighborhood
 
overlap
Graph-level:
Graphlet
 
kernel,
 
WL
 
kernel
 
86
 
Summary
 
87
 
Acknowledgement
 
Most slides from
 
CS224W: Machine Learning with Graphs, Jure Leskovec, Stanford
University, 
http://cs224w.stanford.edu
Slide Note
Embed
Share

Explore the evolution of Machine Learning in Graphs, from traditional ML tasks to advanced Graph Neural Networks (GNNs). Discover key concepts like feature engineering, tools like PyG, and types of ML tasks in graphs. Uncover insights into node-level, graph-level, and community-level predictions, and understand the traditional ML pipeline and feature designs. Dive into node classification and effective feature design for better model performance in graph-based machine learning.

  • Graph Machine Learning
  • Traditional ML
  • Graph Neural Networks
  • Feature Engineering
  • Node Classification

Uploaded on Mar 20, 2024 | 6 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Online Social Networks and Media Graph ML I 1

  2. Graph Machine Learning Outline Part I: Introduction, Traditional ML Part II: Graph Embeddings Part III: GNNs Part IV (if time permits): Knowledge Graphs Slides used based on: CS224W:Machine Learning withGraphs JureLeskovec,StanfordUniversity http://cs224w.stanford.edu 2

  3. Part I: Types of ML Tasks Traditional ML Feature Engineering 3

  4. Tools PyG (PyTorch Geometric): Theultimatelibrary for Graph Neural Networks We further recommend: GraphGym:Platformfor designingGraphNeural Networks. ModularizedGNN implementation, simple hyperparameter tuning, flexible user customization Other network analytics tools: SNAP.PY,NetworkX 4

  5. Types of ML tasks in graphs 5

  6. Types of ML tasks in graphs Node level Graph-level prediction, Graph generation Community (subgraph) level Edge (link) level 6

  7. Traditional ML Pipeline Designfeaturesfornodes/links/graphs Obtain featuresforall trainingdata ? ? ? Node features Link features B Graph features F A E D G C H 7

  8. Traditional ML Pipeline Applythe model: Given a new node/link/graph, obtain its features and makea prediction Trainan ML model: Random forest SVM Neural network, etc. ?? ?1 ? ? ?? ?? 8

  9. Node Level Tasks (example) ? ? ? ? Machine Learning ? Node classification 9

  10. Feature Design Usingeffectivefeaturesovergraphs isthe key to achievinggood modelperformance. TraditionalML pipelineuses hand- designed features. We will overview traditional featuresfor: Node-level prediction Link-level prediction Graph-level prediction Forsimplicity, wefocuson undirectedgraphs. 10

  11. Goal: Makepredictions for a set of objects Designchoices: Features:d-dimensional vectors Objects:Nodes,edges,sets of nodes, entiregraphs Objectivefunction: What task are we aiming to solve? 11

  12. NODE LEVEL FEATURES AND TASKS 12

  13. Node Level Features Goal: Characterizethe structureand position of a node in the network: Node degree Node centrality Node feature Clustering coefficient Graphlets B F A E D G C H 13

  14. Node degree The degree ??of node ? is the number of edges (neighboringnodes)the nodehas. Treatsall neighboringnodesequally. ??= 2 ??= 1 B F ??= 4 A E D G C H ??= 3 14

  15. Node centrality Nodedegree countsthe neighboringnodes withoutcapturingtheir importance. Nodecentrality ??takes the nodeimportance in a graphintoaccount Differentwaysto model importance: Engienvector centrality Eigenvector (Pagerank) centrality Betweenness centrality Closeness centrality and many others 15

  16. Pagerank centrality A node ? is important if surrounded by important neighboring nodes ? ?(?). We model the centrality of node ? as the sum of the centrality of neighboring nodes: ?(?) ? ? = ?????????(?) ? ? 16

  17. Betweness centrality A node is important if it lies on many shortest paths between other nodes. Example: ??= ??= ??= 0 ??= 3 (A-C-B,A-C-D,A-C-D-E) B A E D ??= 3 (A-C-D-E, B-D-E, C-D-E) C 17

  18. Closeness centrality A node is important if it has small shortest path lengths to all other nodes. Example: ??= 1/(2+ 1 + 2 + 3) = 1/8 (A-C-B,A-C,A-C-D,A-C-D-E) B A E D ? = 1/(2 +1 + 1 + 1) = 1/5 ? (D-C-A, D-B, D-C, D-E) C 18

  19. Clustering coefficient Measureshowconnectedthe neighboring nodesof ?are: #(node pairs among ??neighboringnodes) In our examples below the denominator is 6 (4 choose 2). Examples: ? ? ? ??= 0 ??= 0.5 ??= 1 19

  20. Graphlets Observation:Clustering coefficientcountsthe #(triangles) in theego-network ? ? ? ? ??= 0.5 3 triangles (out of 6 node triplets) We can generalizethe abovebycounting #(pre-specified subgraphs),i.e., graphlets. 20

  21. Graphlets Goal: Describenetworkstructurearound node? Graphlets aresmall subgraphs that describe the structure of node ? s network neighborhood B A E ? Analogy: C Degreecounts#(edges)thata nodetouches Clusteringcoefficientcounts#(triangles) thata nodetouches. GraphletDegreeVector(GDV): Graphlet-base featuresfor nodes GDV counts #(graphlets) that a node touches 21

  22. Graphlets Def:Induced subgraph is another graph, formed from a subset of vertices and all the edges connecting the vertices in that subset. B A B B Induced subgraph: Not induced subgraph: E ? ? ? C C C 22

  23. Graphlets Def:Graph Isomorphism Two graphs which contain the same number of nodes connected in the same way are said to be isomorphic. (one-to-one mapping of their nodes) Isomorphic Non-Isomorphic Node mapping: (e2,c2), (e1, c5), (e3,c4), (e5,c3), (e4,c1) The right graph has cycles of length 3 but the left graphdoesnot, so the graphs cannot be isomorphic. Source: Mathoverflow 23

  24. Przulj et al.,Bioinformatics2004 Graphlets Graphlets:Rootedconnected induced non- isomorphicsubgraphs: All possible graphlets on up to 3 nodes Graphlet Degree Vector (GDV): Acount vectorofgraphletsrooted ata given node. 24

  25. Graphlets All possible graphlets on up to 3 nodes Example: ? Graphlet instances of node u: ? ? ? ? Graphlets of node ?: ?,?,?,? [2,1,0,2] 25 28

  26. Przulj et al.,Bioinformatics2004 Graphlets B A E ? C ? has graphlets: 0, 1, 2, 3, 5, 10, 11, Graphlet id(Root/ position of node ?) There are 73 different graphlets on up to 5 nodes 26

  27. Graphlets Consideringgraphlets of size 2-5 nodesweget: Vector of 73 coordinates is a signature of a node that describes the topology of node's neighborhood Graphletdegreevectorprovidesa measureof a node s localnetworktopology: Comparing vectors of two nodes provides a more detailed measure of local topological similarity than node degrees or clustering coefficient B A E ? has graphlets: 0, 1, 2, 3, 5, 10, 11, ? C 27 C

  28. Node Level Features Wehaveintroduceddifferentwaystoobtain node features. They can be categorizedas: Importance-based features: Node degree Differentnodecentrality measures Structure-based features: Node degree Clusteringcoefficient Graphlet countvector 28

  29. Node Level Features Importance-basedfeatures:capture the importanceof a nodein a graph Node degree: Simply countsthe numberof neighboring nodes Node centrality: Models importanceof neighboring nodes in a graph Different modelingchoices: eigenvectorcentrality, betweennesscentrality, closenesscentrality Usefulfor predicting influentialnodesin a graph Example: predicting celebrity users in a social network 29

  30. Node Level Features Structure-basedfeatures:Capture topological propertiesoflocalneighborhood aroundanode. Node degree: Countsthenumber ofneighboring nodes Clustering coefficient: Measures how connectedneighboring nodes are Graphlet degree vector: Counts theoccurrencesof differentgraphlets Usefulforpredictinga particularrole anode plays in a graph: Example: Predicting protein functionality in a protein-protein interaction network. 30

  31. Node Level Tasks ? ? ? ? Machine Learning ? Node classification 31

  32. Protein Folding Computationally predict the 3D structure of a protein based solely on its amino acid sequence: For each node predict its 3D coordinates 32 Imagecredit:DeepMind

  33. Imagecredit: DeepMind Imagecredit: SingularityHub 33

  34. Key idea: Spatial graph Nodes: Amino acids in a protein sequence Edges: Proximity between amino acids (residues) Spatial graph Imagecredit:DeepMind 34

  35. LINK PREDICTION 35

  36. Link Prediction Thetask is to predict newlinksbasedon the existinglinks. Two ways: (a) define a score for each pair of nodes, rank pairs, return top K ones, (b) build a classifier with input pair of nodes, output probability of existence ? B F A E D G C ? H 36

  37. Link Prediction Thekey is to design features for apair of nodes. (for computing the score, as input to the classifier First, score ? B F A E D G C ? H 37

  38. Link Prediction (1) Linksmissingat random: Missing/unknown, incomplete information Remove a random set of links and then aim to predict them 38

  39. Link Prediction (2) Temporal Links Prediction Given ?[?0,? ] a graph defined by edges up to time ? , output a ranked list L of edges (not in ?[?0,? ]) that are predicted to appear in time ?[?1,?1] 0 0 0 ?[? ,? ] ?[?1,? ] 0 0 1 1 39

  40. Score-based Link Prediction Methodology: For each pair of nodes (x,y) compute score c(x,y) Forexample, c(x,y) could be the# of commonneighbors of x and y Sort pairs (x,y) by the decreasing score c(x,y) Predict top n pairs as new links Evaluation: X n = |Enew|: # new edges thatappear during the testperiod [?1,? ] Taketopn elements of L and countcorrectedges 40

  41. Link Level Features Distance-based feature Local neighborhoodoverlap Global neighborhoodoverlap Link feature B F A E D G C H 41

  42. Distance-based Features Shortest-path distancebetweentwo nodes Example: B F ???= ???= ???= 2 ???= ???= 3 A E D G C H H However,thisdoesnotcapturethedegree of neighborhood overlap: Node pair (B, H) has 2 shared neighboring nodes, while pairs (B, E) and (A, B) only have 1 such node. 42

  43. Local Neighborhood Overlap Features Local Neighborhood Features: Captures # neighboring nodes shared between two nodes Example: ?(?,?) Common neighbors: |? ?1 ?2| B ?? A Jaccard coefficient: E D |? ?1 ?2| |? ?1 ?2| C ?? F 43

  44. Local Neighborhood Overlap Features Adamic-Adar index: 1 log(??) ? ? ?1 ?(?2) B ?? A E D C ?? F 44

  45. Global Neighborhood Overlap Features Limitationof local neighborhood features: Metric is always zero if the two nodes do not have any neighbors in common. B ?? A ?? ??= ? |?? ??| = 0 E D C ?? F However, the two nodes maystill potentially connect in the future. Global neighborhoodoverlapmetrics resolve the limitation byconsideringthe entire graph. 45

  46. Global Neighborhood Overlap Features Katz index: countsthe numberof walksof all lengths betweena given pairofnodes. How to compute #walksbetweentwo nodes? Usepowersof the adjacencymatrix! 46

  47. Global Neighborhood Overlap Features Computing#walksbetweentwo nodes Recall: ???= 1 if ? ?(?) Let ?(?)= #walks of length ? between ? and ? ?? We will show ?(?)= ?? ?(?)= #walks of length 1 (direct neighborhood) between ? and ? = ??? ?? ?(?)= ??? ?? 4 3 2 1 47

  48. Global Neighborhood Overlap Features How to compute Step 1: Compute #walks of length 1 between each of ? s neighbor and ? Step 2: Sum up these #walks across u s neighbors ? ?(?)= ? ?(?)= ? = ?? ? ?? ?? ?? ?? ?? ? ? ?? #walks of length 1 between Node 1 s neighbors and Node 2 ?(?)= ?2 Node 1 s neighbors 12 ?? Power of adjacency 48

  49. Global Neighborhood Overlap Features Howto compute#walksbetween twonodes? Use adjacencymatrix powers ???specifies #walks of length 1 (direct neighborhood) between ? and ?. ??specifies #walks of length 2 (neighbor of ?? neighbor) between ? and ?. And, ?? specifies #walks of length ?. ?? 49

  50. Global Neighborhood Overlap Features Katzindexbetween ?1and?2is calculatedas Sum over all walk lengths #walks of length ? between ?1and?2 0 < ? < 1: discount factor Katzindex matrixis computedin closed-form: = ???? by geometric series of matrices ?=0 5

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#