Theoretical Justification of Popular Link Prediction Heuristics

Theoretical Justification of Popular
Link Prediction Heuristics
Purnamrita Sarkar (UC Berkeley)
Deepayan Chakrabarti (Yahoo! Research)
Andrew W. Moore (Google, Inc.)
1
Link Prediction
  Which pair of nodes {i,j} 
should
 
be connected?
Alice
Bob
Goal: Recommend a movie
Link Prediction
  Which pair of nodes {i,j} 
should
 
be connected?
Goal: Suggest friends
Link Prediction Heuristics
 
Predict link between nodes
Connected by the shortest path
With the most 
common neighbors
 (length 2 paths)
More weight to low-degree common nbrs (
Adamic/Adar
)
Link Prediction Heuristics
 
Predict link between nodes
Connected by the shortest path
With the most 
common neighbors 
(length 2 paths)
More weight to low-degree common nbrs (
Adamic/Adar
)
With more 
short
 paths (e.g. length 3 paths )
exponentially decaying weights to longer paths (
Katz measure
)
 
Previous Empirical Studies
*
 
Random
Shortest
Path
Common
Neighbors
Adamic/Adar
Ensemble of
short paths
Link prediction accuracy*
*Liben-Nowell & Kleinberg, 2003; Brand, 2005;  Sarkar & Moore, 2007
How do we justify these
observations?
Especially if the
graph is sparse
Link Prediction – Generative Model
7
Model:
 
1.
Nodes are uniformly distributed points in a latent space
2.
This space has a distance metric
3.
Points close to each other are likely to be connected in the graph
Logistic distance function (Raftery+/2002)
Link Prediction – Generative Model
8
α
 determines the steepness
The problem of 
link prediction 
is to find the 
nearest neighbor 
who is not
currently linked to the node.
 Equivalent to 
inferring distances in the latent space
Model:
1.
Nodes are uniformly distributed points in a latent space
2.
This space has a distance metric
3.
Points close to each other are likely to be connected in the graph
Logistic distance function (Raftery+/2002)
Previous Empirical Studies
*
 
Random
Shortest
Path
Common
Neighbors
Adamic/Adar
Ensemble of
short paths
Link prediction
accuracy
*Liben-Nowell & Kleinberg, 2003; Brand, 2005;  Sarkar & Moore, 2007
Especially if the
graph is sparse
Common Neighbors
Pr
2
(i,j) = Pr(common neighbor|d
ij
)
As
 
α
  Logistic 
 Step function
Much easier to analyze!
i
j
Common Neighbors
11
Everyone has same radius 
r
i
j
# common nbrs
gives a bound on
distance
Unit volume universe
Common Neighbors
OPT =  node closest to i
MAX = node with max common neighbors with i
Theorem:
 
w.h.p
Link prediction by common neighbors is
asymptotically optimal
d
OPT 
 ≤ d
MAX 
≤ d
OPT 
 + 2[
ε
/V(1)]
1/D
Common Neighbors: Distinct  Radii
 
Node 
k
 has radius r
k
 .
 
 i
k  if  d
ik
 ≤ r
k
  (Directed graph)
r
k 
 captures popularity of node k
 
 
 
 
“Weighted” common neighbors:
Predict (i,j) pairs with highest  
Σ
 w(r)
η
(r)
 
 
 
 
13
k
 
j
r
k
 
Weight for nodes
of radius r
 
# common
neighbors of
radius r
Type 2 common neighbors
 
Adamic/Adar
 
1/r
Previous Empirical Studies
*
 
Random
Shortest
Path
Common
Neighbors
Adamic/Adar
Ensemble of
short paths
Link prediction
accuracy
*Liben-Nowell & Kleinberg, 2003; Brand, 2005;  Sarkar & Moore, 2007
Especially if the
graph is sparse
ℓ-
hop Paths
Common neighbors = 2 hop paths
For longer paths:
Bounds are weaker
For 
 
 
we need  
η
 
>> 
η
 
 
to obtain similar bounds
 justifies the exponentially decaying weight given to longer paths by the
Katz measure
Conclusion
Three key ingredients
1.
Closer points are likelier to be linked.
 
Small World Model- Watts,  Strogatz, 1998, Kleinberg 2001
2.
Triangle inequality holds
 
necessary to extend to 
ℓ-
hop paths
3.
Points are spread uniformly at random
 
 
Otherwise properties will depend on location as well as
distance
Summary
 
Random
Shortest
Path
Common
Neighbors
Adamic/Adar
Ensemble of
short paths
Link prediction accuracy*
*Liben-Nowell & Kleinberg, 2003; Brand, 2005;  Sarkar & Moore, 2007
The number of paths
matters, not the
length
For large dense graphs,
common neighbors are
enough
Differentiating between
different degrees is
important
In sparse graphs,
length 3 or more
paths help in
prediction.
Thanks!
Link Prediction – Generative Model
20
Two sources of randomness
  
Point positions: 
uniform in D dimensional space
  
Linkage probability: 
logistic with parameters 
α
, r
  
α
, r
 and D 
are known
α
 determines the steepness
The problem of 
link prediction 
is to find the 
nearest neighbor 
who is not
currently linked to the node.
 Equivalent to 
inferring distances in the latent space
Revisiting Raftery et al.’s model
Factor ¼ weak bound for
Logistic
 Can be made tighter, as logistic approaches the step function.
Problem Statement
22
Generative model
Link Prediction
Heuristics
node 
a
 
Most likely neighbor of
node 
i 
?
node 
b
 
Compare
A few
properties
 Can justify the
empirical observations
 We also offer some new
prediction algorithms
New Estimators
 
Combine bounds from different radii
But there might not be enough data to obtain individual bounds
from each radius
 
New sweep estimator
Q
r
 = Fraction of nodes w. radius ≤ r, which are common neighbors.
 
 
 
 
 
Higher Q
r
 smaller d
ij
 w.h.p
New Estimators
 
Q
r
 = Fraction of nodes w. radius ≤ r, which are common neighbors
larger Q
r
 
 smaller d
ij
 w.h.p
 
T
R
 : = Fraction of nodes w. radius ≥ R, which are common neighbors.
 
 
 
 
 
Smaller T
R
  large d
ij
 w.h.p
Sweep Estimators
Q
r 
 = Fraction of nodes
with radius ≤ r which are
common neighbors
T
R 
 = Fraction of nodes
with radius ≥ R which are
common neighbors
Number of
common neighbors
of a given radius
r
Link Prediction
  Which pair of nodes {i,j} 
should
 
be connected?
Variant:  node 
i
 is given
Link Prediction – Generative Model
27
Nodes are uniformly distributed in a latent space
The problem of link prediction is to find the nearest neighbor who is not
currently linked to the node.
 Equivalent to 
inferring distances in the latent space
Raftery et al.’s Model:
 
Points close in this space are more
likely to be connected.
Link Prediction – Generative Model
28
Two sources of randomness
  
Point positions: 
uniform in D dimensional space
  
Linkage probability: 
logistic with parameters 
α
, r
  
α
, r
 and D 
are known
Type 2 common neighbors
Example graph:
  N
1
 nodes of radius r
1
 and N
2
 nodes of radius r
2
  r
1
 << r
2
 
        Maximize Pr[
η
1
 , 
η
2
 | d
ij
] = product of two binomials
 
         w(r
1
)
 
E[
η
1
|d*] + 
w(r
2
) E[
η
2
|d*] = 
w(r
1
)
η
1
 + 
w(r
2
) 
η
2
 
 
RHS ↑  
 LHS 
 d* ↓
Type 2 common neighbors
{
Variance
Jacobian
 
Adamic/Adar
 
1/r
ℓ-
hop Paths
 
Common neighbors = 2 hop paths
Analysis of longer paths: two components
1.  Bounding E(
η
l
 | 
d
ij
).
 
[
η
l
 
= # 
l 
 
hop paths]
Bounds Pr
l 
(i,j) 
by using 
triangle inequality on a 
series of
common neighbor probabilities.
 
2.   
η
l
 
 
E(
η
l
 | 
d
ij
)
 
 
l 
hop Paths
Common neighbors = 2 hop paths
Analysis of longer paths: two components
1.  Bounding E(
η
l
 | 
d
ij
)
 
[
η
l
 
= # 
l 
hop paths]
Bounds Pr
l 
(i,j) 
by using 
triangle inequality on a 
series of
common neighbor probabilities.
2.   
η
l
 
 
E(
η
l
 | 
d
ij
)
Bounded dependence of 
η
l
 
on position of each node
 Can use McDiarmid’s inequality to bound
              
|
η
l
  
-
E(
η
l
|
d
ij
)|
Type 2 common neighbors
Example graph:
  N
1
 nodes of radius r
1
 and N
2
 nodes of radius r
2
 
 w(r
1
)
 
E[
η
1
|d*] + 
w(r
2
) E[
η
2
|d*] = 
w(r
1
)
η
1
 + 
w(r
2
) 
η
2         
(d
*
=MLE)
 
Decreasing function of d
*
 
“Weighted” common neighbors
 
Weights
RHS 
 d
* 
Link prediction by weighted common neighbors is justified
Common Neighbors: Distinct  Radii
Node 
k
 has radius r
k
 .
 i
k  if  d
ik
 ≤ r
k
  (Directed graph)
r
k 
 captures popularity of node k
34
k
 
j
r
k
Common Neighbors: Distinct  Radii
Node 
k
 has radius r
k
 .
 i
k  if  d
ik
 ≤ r
k
  (Directed graph)
r
k 
 captures popularity of node k
35
k
j
Type 1: i
 k 
 j
r
i
r
j
   
A(r
i 
, r
j 
,d
ij
)
Type 2 common neighbors
Example graph:
 N
1
 nodes of radius r
1
 and N
2
 nodes of radius r
2
 
η
1
 and 
η
2
 common neighbors with these radii
 
 w(r
1
)
 
E[
η
1
|d*] + 
w(r
2
) E[
η
2
|d*] = 
w(r
1
)
η
1
 + 
w(r
2
) 
η
2         
(d
*
=MLE)
 
Decreasing function of d
*
 
“Weighted” common neighbors
 
Weights
More “weighted” common neighbors
 
 points are closer
                                                   
 Useful for link prediction
ℓ-
hop Paths
Common neighbors = 2 hop paths
 
Analysis of longer paths:
1.
Triangulation: 
ℓ-hop path as a sequence of common
neighbors
2.
“Metric” property: 
intermediate distances linked to d
ij
ℓ-
hop Paths
Bound d
ij
 as a function of 
η
For 
 
 
we need  
η
 
>> 
η
 
 
to obtain similar bounds
 justifies the exponentially decaying weight given to longer paths by the
Katz measure
Also, we can obtain much tighter bounds for long paths if
shorter paths are known to exist.
39
Higher probability of
linking
α
 determines the steepness
 
Slide Note
Embed
Share

This content discusses the theoretical justification of popular link prediction heuristics such as predicting connections between nodes based on common neighbors, shortest paths, and weights assigned to low-degree common neighbors. It also explores link prediction generative models and previous empirical studies supporting these prediction techniques.

  • Link prediction
  • Heuristics
  • Theoretical justification
  • Common neighbors
  • Shortest paths

Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Theoretical Justification of Popular Link Prediction Heuristics Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) 1

  2. Link Prediction Which pair of nodes {i,j} should be connected? Alice Bob Charlie Goal: Recommend a movie

  3. Link Prediction Which pair of nodes {i,j} should be connected? Goal: Suggest friends

  4. Link Prediction Heuristics Predict link between nodes Connected by the shortest path With the most common neighbors (length 2 paths) More weight to low-degree common nbrs (Adamic/Adar) Alice Prolific common friends Less evidence 1000 followers Bob Less prolific Much more evidence 8 followers Charlie

  5. Link Prediction Heuristics Predict link between nodes Connected by the shortest path With the most common neighbors (length 2 paths) More weight to low-degree common nbrs (Adamic/Adar) With more short paths (e.g. length 3 paths ) exponentially decaying weights to longer paths (Katz measure)

  6. Previous Empirical Studies* Especially if the graph is sparse How do we justify these observations? Link prediction accuracy* Random Shortest Path Common Neighbors Adamic/Adar Ensemble of short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  7. Link Prediction Generative Model Unit volume universe Model: 1. Nodes are uniformly distributed points in a latent space 2. This space has a distance metric 3. Points close to each other are likely to be connected in the graph Logistic distance function (Raftery+/2002) 7

  8. Link Prediction Generative Model Higher probability of linking determines the steepness 1 radius r Model: 1. Nodes are uniformly distributed points in a latent space 2. This space has a distance metric 3. Points close to each other are likely to be connected in the graph Logistic distance function (Raftery+/2002) The problem of link prediction is to find the nearest neighbor who is not currently linked to the node. Equivalent to inferring distances in the latent space 8

  9. Previous Empirical Studies* Especially if the graph is sparse Link prediction accuracy Random Shortest Path Common Neighbors Adamic/Adar Ensemble of short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  10. Common Neighbors j i Pr2(i,j) = Pr(common neighbor|dij) = Pr (i, j) Pr( ~ d | ) Pr( ~ d | ) d ( P d , d | ) d d i k j k 2 ik jk ik jk ij ik jk Product of two logistic probabilities, integrated over a volume determined by dij As Logistic Step function Much easier to analyze!

  11. Common Neighbors Everyone has same radius r Unit volume universe j i =Number of common neighbors = Pr (i, j) A(r, r, d ) 2 ij + = P A(r, r, d ) 1 2 ij N N V(r)=volume of radius r in D dims # common nbrs gives a bound on distance / 1 / 2 D D + /N /N 2 1 d 2 1 r r ij V(r) V(r) 11

  12. Common Neighbors OPT = node closest to i MAX = node with max common neighbors with i Theorem: w.h.p dOPT dMAX dOPT + 2[ /V(1)]1/D Link prediction by common neighbors is asymptotically optimal

  13. Common Neighbors: Distinct Radii Node k has radius rk . m i i k if dik rk (Directed graph) rk captures popularity of node k k rk j Weighted common neighbors: Predict (i,j) pairs with highest w(r) (r) # common neighbors of radius r Weight for nodes of radius r 13

  14. Type 2 common neighbors i k j Adamic/Adar Presence of common neighbor is very informative const const Absence is very informative w(r) 1 r deg D 1/r r is close to max radius Real world graphs generally fall in this range

  15. Previous Empirical Studies* Especially if the graph is sparse Link prediction accuracy Random Shortest Path Common Neighbors Adamic/Adar Ensemble of short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  16. -hop Paths Common neighbors = 2 hop paths For longer paths: ( ) + dij r ( 1)r 1 - g , N, Bounds are weaker For we need >> to obtain similar bounds justifies the exponentially decaying weight given to longer paths by the Katz measure

  17. Conclusion Three key ingredients 1. Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg 2001 2. Triangle inequality holds necessary to extend to -hop paths 3. Points are spread uniformly at random Otherwise properties will depend on location as well as distance

  18. Summary In sparse graphs, length 3 or more paths help in prediction. Differentiating between different degrees is important For large dense graphs, common neighbors are enough Link prediction accuracy* The number of paths matters, not the length Random Shortest Path Common Neighbors Adamic/Adar Ensemble of short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  19. Thanks!

  20. Link Prediction Generative Model Higher probability of linking determines the steepness 1 radius r Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters , r , r and D are known The problem of link prediction is to find the nearest neighbor who is not currently linked to the node. Equivalent to inferring distances in the latent space 20

  21. Revisiting Raftery et al.s model 1 Factor weak bound for Logistic d + + A e (V A) Pr (i, j) A c(r, D) , ij 1 1 4 2 2 Can be made tighter, as logistic approaches the step function.

  22. Problem Statement Link Prediction Heuristics Generative model A few properties Most likely neighbor of node i ? node b node a Compare Can justify the empirical observations We also offer some new prediction algorithms 22

  23. New Estimators Combine bounds from different radii But there might not be enough data to obtain individual bounds from each radius New sweep estimator Qr= Fraction of nodes w. radius r, which are common neighbors. 2/D c 1 E(Q ) A(r, r, d ) d 2r 1 Q V(r) N r ij ij r r Higher Qr smaller dij w.h.p

  24. New Estimators Qr= Fraction of nodes w. radius r, which are common neighbors larger Qr smaller dij w.h.p TR: = Fraction of nodes w. radius R, which are common neighbors. 1/D c 1 + T A(R, R, d ) d 2R 1 T V(R) N R ij ij R R Smaller TR large dij w.h.p

  25. Sweep Estimators Number of common neighbors of a given radius r TR = Fraction of nodes with radius R which are common neighbors Qr = Fraction of nodes with radius r which are common neighbors Small TR large dij Large Qr small dij

  26. Link Prediction Which pair of nodes {i,j} should be connected? Variant: node i is given Alice Bob Charlie Friend suggestion in Facebook Movie recommendation in Netflix

  27. Link Prediction Generative Model Raftery et al. s Model: Points close in this space are more likely to be connected. Unit volume universe Nodes are uniformly distributed in a latent space The problem of link prediction is to find the nearest neighbor who is not currently linked to the node. Equivalent to inferring distances in the latent space 27

  28. Link Prediction Generative Model Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters , r , r and D are known Higher probability of linking determines the steepness 1 radius r 28

  29. Type 2 common neighbors Example graph: N1 nodes of radius r1 and N2 nodes of radius r2 r1 << r2 2 ~ Bin[N2 , A(r2, r2, dij)] 1 ~ Bin[N1 , A(r1, r1, dij)] k j i Maximize Pr[ 1 , 2 | dij] = product of two binomials w(r1)E[ 1|d*] + w(r2) E[ 2|d*] = w(r1) 1 + w(r2) 2 RHS LHS d*

  30. Type 2 common neighbors Jacobian A d = w(r) ij A(1 { Variance const A) Adamic/Adar Small variance Presence is more surprising const w(r) Small variance Absence is more surprising 1 r deg D 1/r r is close to max radius Real world graphs generally fall in this range

  31. -hop Paths Common neighbors = 2 hop paths Analysis of longer paths: two components 1. Bounding E( l | dij). [ l= # l hop paths] Bounds Prl (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. l E( l | dij) Triangulation

  32. l hop Paths Common neighbors = 2 hop paths Analysis of longer paths: two components 1. Bounding E( l | dij) [ l= # l hop paths] Bounds Prl (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. l E( l | dij) Bounded dependence of lon position of each node Can use McDiarmid s inequality to bound| l - E( l|dij)|

  33. Type 2 common neighbors Example graph: N1 nodes of radius r1 and N2 nodes of radius r2 2 ~ Bin[N2 , A(r2, r2, dij)] 1 ~ Bin[N1 , A(r1, r1, dij)] k Weights j i w(r1)E[ 1|d*] + w(r2) E[ 2|d*] = w(r1) 1 + w(r2) 2 (d*=MLE) Decreasing function of d* Weighted common neighbors RHS d* Link prediction by weighted common neighbors is justified

  34. Common Neighbors: Distinct Radii Node k has radius rk . m i i k if dik rk (Directed graph) rk captures popularity of node k k rk j 34

  35. Common Neighbors: Distinct Radii Node k has radius rk . i k if dik rk (Directed graph) rk captures popularity of node k Type 2: i k j Type 1: i k j k k j j rj i rk i rk ri A(ri , rj ,dij) A(rk , rk ,dij) 35

  36. Type 2 common neighbors Example graph: N1 nodes of radius r1 and N2 nodes of radius r2 1 and 2 common neighbors with these radii k Weights j i w(r1)E[ 1|d*] + w(r2) E[ 2|d*] = w(r1) 1 + w(r2) 2 (d*=MLE) Decreasing function of d* Weighted common neighbors More weighted common neighbors points are closer Useful for link prediction

  37. -hop Paths Common neighbors = 2 hop paths Analysis of longer paths: 1. Triangulation: -hop path as a sequence of common neighbors 2. Metric property: intermediate distances linked to dij

  38. -hop Paths Bound dij as a function of ( ) + dij r ( 1)r 1 - g , N, For we need >> to obtain similar bounds justifies the exponentially decaying weight given to longer paths by the Katz measure Also, we can obtain much tighter bounds for long paths if shorter paths are known to exist.

  39. Higher probability of linking determines the steepness 1 39

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#