Sublinear Algorithms and Graph Parameters in Centralized and Distributed Computing

On Centralized Sublinear
Algorithms and Some Relations to
Distributed Computing
D
a
n
a
 
R
o
n
T
e
l
-
A
v
i
v
 
U
n
i
v
e
r
s
i
t
y
A
D
G
A
,
 
O
c
t
o
b
e
r
 
2
0
1
5
Efficient (Centralized) Algorithms
 
Usually, when we say that an algorithm is 
efficient
we mean that it runs in time 
polynomial
 in the input
size 
n 
(e.g., size of an input string 
s
1
s
2
…s
n
,
or number of 
vertices
 in a graph ).
Naturally, we seek as 
small an exponent
 as
possible, so that 
O(n
2
)
 is 
good
, 
O(n
3/2
log
3
(n))
 is
better
, and linear time 
O(n)
 is 
really great!
But what if
 n 
is 
HUGE
 so that even linear time is
prohibitive? Are there tasks we can perform 
“super-
efficiently”
 in 
sub-linear
 time?
Supposedly, we need linear time just to
read
 the input without processing it at all.
But what if we don’t read the whole input but rather
sample 
from it?
 
s
1
s
2
…s
i
…s
j
…s
n
Sublinear Algorithms
Given 
query access 
to an object 
O
, perform
computation that is (
approximately
) 
correct
 with
high (constant) probability, 
after performing as
few queries 
as possible.
 
?
 
?
 
?
 
?
 
?
 
To define task 
precisely
, must specify:
object
, 
query
 
access, desired 
computation
 and notion
of 
approximation
Sublinearity
is in a sense
inherent 
in
distributed
computing
 
Examples
 
The object can be an 
array
 of numbers and would
like to decide whether it is 
sorted
The object can be a 
function
 and would like to
decide whether it is 
linear 
(
corresponds to the
Hadamard Code
)
The object can be 
an image 
and would like to
decide whether 
it is a cat
/
is convex.
The object can be 
a set of points 
and would like
to approximate the cost of 
clustering into k
clusters 
(according to some objective function)
The object can be a
 graph 
and would like to
approximate some 
graph parameter
.
Graph Parameters
 
A 
Graph Parameter: 
a function 
 
that is defined
on a graph 
G
 (undirected / directed, unweighted /
weighted).
For example:
 Average 
degree
 Number of 
subgraphs 
H 
in
 G
 Number of 
connected components
 Minimum size of a 
vertex cover
 Maximum size of a 
matching
 
Number of edges that should be added to make
graph 
k
-connected 
(
distance 
to
 k-connectivity
)
 Minimum weight of a 
spanning tree
Computing/Approximating
Graph Parameters Efficiently
 
For 
all 
parameters 
described in the previous slide,
have 
efficient
, i.e., 
polynomial-time
algorithms for 
computing
 the parameter
(possibly
 approximately
). For some
even 
linear-time
.
 
However, in some cases, when inputs are 
very
large
, we might want 
even more efficient
algorithms: 
sublinear-time
 algorithms.
 
Such algorithms do 
not even read the entire input
,
are 
randomized
, and provide an 
approximate
answer (with high success probability).
Sublinear Approximation on Graphs
 
Algorithm is given 
query access
 to 
G.
Types of
 queries
 that consider:
 
Neighbor
 queries – “who is 
i
th 
neighbor of 
v
?”
 
Degree 
queries – “what is 
deg(v
)?”
 
Vertex-pair
 queries – “is there an edge
btwn 
u 
and 
v
?”
?
 
After performing number of queries that is 
sublinear
in size of 
G
, should output 
good approximation
 
  of
(G), 
with 
high constant probability 
(
w.h.c.p.
).
Types of
 approximation 
that consider:
 
(G) 
 
 
≤ (1+
)
(G) 
(for given
 
 
: 
a
 (1+
)-
approx
.
)
 
(G) 
 
 

(G) 
(for fixed
 
 : 
an
 
-
approx
.
)
 
(G) 
 
 

(G) + 
n  
(for fixed
 
 
and
 
given
 
where 
n 
is size of range of
 
(G) 
: an
 (
, 
)-
approx
.
)
 
+ 
weight
 of edge
Survey Results in 3 Parts
 
I.
Average 
degree 
and number of
 subgraphs
II.
 Size of minimum 
vertex cover 
and maximum
matching
III.
 Minimum weight of 
spanning tree
Part I: Average Degree
 
Let 
d
avg 
=
 
d
avg
(G)
 denote average degree in 
G, d
avg
1
(can compute exactly in linear time)
Observe:
 
approximating average of
 
general function
 
with
range
 {0,..,n-1} 
(degrees range) requires
 
(n)
 queries, so
must exploit 
non-generality
 of degrees
Can obtain 
(2+
)-
approximation of 
d
avg
 
by performing
O(n
1/2
/
)
 
degree
 queries 
[Feige]
.
Going below 
2
: 
(n)
 queries 
[Feige]
.
With 
degree
 and 
neighbor
 queries, can obtain 
(1+
)-
approximation by performing 
Õ(n
1/2
 poly(1/
))
 queries
[Goldreich,R]
.
Comment1:
 In both cases, can replace 
n
1/2
 with 
(n/d
avg
)
1/2
Comment2:
 In both cases, results are 
tight
 (in terms of
dependence on 
n/d
avg
)
.
Average Degree (cont)
Ingredient 1:
 
Consider partition of all graph vertices
into 
r=
O((log n)/
)
 
buckets:
 In bucket
 B
i
 
vertices 
v
  s.t
.
     
(1+
)
i-1 
< deg(v) ≤ 
(1+
)
i
       
(
 
= 
/
8
 )
 )
 
Claim:
    
(1/n)
large
 
i  
b
i
(1+
)
i
 
 
d
avg
/(2+
)
        
(**)
Ingredient 2:
 
Ignore 
small 
B
i
’s. Take sum in 
(*) 
only
over 
large
 buckets (
|B
i
| > (
n)
1/2
/2r
).
 
Suppose can obtain for each 
i
 estimate 
b
i
=|B
i
|
(1

)
          (1/n)
i 
b
i
(1+
)
i
 
=
 
(1

)
d
avg                   
(*)
How to obtain
 b
i
? By 
sampling 
(and applying 
[Chernoff]
)
.
Difficulty: 
if 
B
i
 is small 
(<< n
1/2
) then necessary sample
is too large 
((|B
i
|/n)
-1
 >> n
1/2
).
Average Degree (cont)
Claim: 
(1/n)
large
 
i  
b
i
(1+
)
i
 
 
d
avg
/(2+
)       (**)
 
Sum of degrees = 
2
 
 num of edges
 
small 
buckets
 
large 
buckets
 
Using 
(**)
 get 
(2+
)
-approximation with 
Õ(n
1/2
/
2
)
degree
 queries
 
(
small: 
|B
i
| 
 (
n)
1/2
/2r
,
     
r 
: num of buckets)
Ingredient 3:
 
Estimate num of edges counted 
once
 and
compensate
 for them.
Average Degree (cont)
 
small 
buckets
 
large 
buckets
Ingredient 3:
 
Estimate num of edges counted 
once
 and
compensate
 for them.
 
For each 
large
 
B
i
 estimate num of edges between
 B
i
 
and
small 
buckets by 
sampling neighbors
 of (random)
vertices in 
B
i
.
By adding this estimate 
e
i
 
to 
(**)
 get 
(1+
)-
approx.
 
(1/n)
large
 
i  
b
i
(1+
)
i
         
(**)
(
1
/
n
)
l
a
r
g
e
 
i
 
(
b
i
(
1
+
)
i
 
+
 
e
i
)
Part I(b): Number of subgraphs - Stars
Approximating avg. degree same as approximating num
of 
edges
. What about other 
subgraphs? 
(Also known as
counting 
network motifs.
)
 
[Gonen,R,Shavitt]
 considered 
length-
2
 paths
, and more
generally, 
k
-stars
.
(avg deg + 
2
-stars gives 
variance
, larger 
k
– higher 
moments
)
 
Let  
s
k
 = s
k
(G)
 denote num of 
k
-stars. Give 
(1+
)-
approx
algorithm with 
query 
complexity (degree+neighbors):
 
Show that this upper bound is 
tight
.
Part I(c): Number of subgraphs - Triangles
 
Counting number of 
triangles
 
t=t(G)
exactly/approximately studied quite extensively in the
past. All previous algorithms 
read entire graph
.
[Gonen,R,Shavitt]
 showed that using only 
degree and
neighbor queries
 there is 
no sublinear algorithm 
for
approximately counting num of triangles.
Natural question: What if also allow 
vertex-pair queries
?
[Eden,Levi,R,Seshadhri]
 
Give 
(1+
)-
approx algorithm with
query 
complexity (
degree
+
neighbors
+
vertex-pairs
):
             
O(n/t
1/3
 + m
3/2
/t) poly(log n,1/
)
and give 
matching lower bound
.
Part II: 
The Minimum Vertex Cover (VC)
Recall:
 For a graph 
G = (V,E)
, a 
vertex cover
 of 
G 
is a
subset 
C 
 V
 s.t. for every 
{u,v} 
 E
, 
{u,v} 
 C 
 
 
Computing the size of a minimum vertex cover
is 
NP-hard
, but a 
factor-2 approx
 can be
found in 
linear time 
[Gavril], [Yanakakis]
 
Can we get approx even 
more
 efficiently, i.e., in
sublinear-time?
Min VC – (Imaginary) Oracle
 
Initially considered in 
[Parnas,R].
First basic idea
: Suppose had 
oracle 
that for
given vertex 
v 
answers if 
v
C
 for some 
fixed
vertex cover 
C
 that is at most factor 
larger than 
min size 
of vertex cover, 
vc(G)
.
 
Consider uniformly and independently
sampling 
s=
(1/
2
)  
vertices, and 
querying
oracle 
on each sampled vertex.
 
Let 
s
vc
 be number of sampled vertices that answers: 
in
 
C
.
By 
Chernoff
, w.h.c.p   
|C|/n-
/2 
 s
vc
/s 
 
|C|/n+
/2
Since 
vc(G) 
 |C| 
 
 vc(G), 
if define 
vc’=
 (s
vc
/s + 
/2)n
then (w.h.c.p) 
vc(G) 
 vc’ 
 
 vc(G) + 
n
That is, get 
(
,
)-
approximation
 
v
?
 
v
C !
 
v
C !
Min VC: Distributed connection
 
Second idea:
 Can use 
distributed algorithm
 to implement
oracle.
Suppose have dist. alg. (in message passing model) that
works in 
k 
rounds of communication
.
Then oracle, when called on vertex 
v
, will 
emulate 
dist.
alg. on 
k
-distance-neighborhood
 of 
v
 
Query complexity
 of each oracle call:
 O(d
k
) 
where 
d
 is
max degree in the graph.
 
v
?
 
v
 
v
C !
Min VC - Results
 
By applying dist. alg. of 
[Kuhn,Moscibroda,Wattenhofer]
get 
(c,
)-
approx. (
c>2
) with complexity 
d
O(log d)
/
2
, and
(2,
)
-approx. with complexity 
d
O(d)poly(1/
)
.
Comment 1:
 
Can replace max deg
 d 
with
 d
avg
/
 
[PR]
Comment 2:
 Going below 
2
 : 
(n
1/2
)
 queries 
(Trevisan)
7/6
: 
(n)
 
[Bogdanov,Obata,Trevisan]
Comment 3:
 Any 
(c,
)
-approximation:  
(d
avg
)
 queries 
[PR]
 
Sequence of improvements for 
(2,
)
-approx
[Marko,R]
:  
d
O(log(d/
)) 
- using dist. alg. similar to
max ind. set
 alg of 
[Luby]
[Nguyen,Onak]
: 
2
O(d)
/
2
 – emulate classic 
greedy algorithm
(maximal matching) 
[Gavril],[Yanakakis]
[Yoshida,Yamamoto,Ito]
: 
O(d
4
/
2
) 
– sophisticated analysis
of better emulation
[Onak,R,Rosen,Rubinfeld]
: 
Õ(d
avg 
poly(1/
))
Min VC:  
[NO]
 Algorithm
 
Factor-2
 approx 
alg
 of
 
[Gavril],[Yanakakis]
 (
not
 sublin)
 
C 
 
,
 
U 
 E
  
(
C
: vertex cover, 
U
: uncovered edges, 
)
 
while 
U 
 
  -
 
select (arbitrarily)
 e = {u,v} 
in
 U
  
-
 C 
 
C 
 {u,v}   
(
add both endpoints of 
e 
to cover
)
  
-
 U 
 
U \ ({ {u,w}
 E } 
 { {v,w}
 E } )
                  
(
remove from 
U 
all edges covered by 
u
 or 
v
)
 
Slightly different description of alg:
 C 
 
, 
U 
 E
, 
 
 
arbitrary
 permutation
 of
 E
 For 
j = 1 to |E|
  
-
 
If 
j
 = {u,v} 
in
 U:
  
    
C 
 
C 
 {u,v}
      U 
 
U \ ({ {u,w}
 E } 
 { {v,w}
 E } )
 
,
M

 
, 
M 
 M 
 { {u,v} }
 
2
 
1
 
7
 
5
 
4
 
3
 
6
 
8
Min VC: 
[NO]
 Algorithm 
(cont)
 
Let 
M
(G)
 be 
maximal matching
 resulting from alg when
using permutation 
 (so that 
vc(G)/2 
 |M
(G)| 
 vc(G)
 )
Can estimate 
vc(G)
 by estimating 
|M
(G)|: 
sample
 
(1/
2
)
edges
 uniformly; for each check if in 
M
(G)
 by calling
maximal matching
 oracle.
Basic observation:
 For an edge 
e = {u,v}
, if for some
e’ = {u,w}
 (or 
{v,w}
), 
(e’) < 
(e) 
and 
e’ 
in
 M
(G) 
then
 e
not in
 M
(G). 
Otherwise,
 e 
in
 M
(G) .
 
MO
 
(input: edge
 e = {u,v}
, output:
 
is
 e 
in
 M
(G)
)
 For each edge 
e’ 
 e
 incident to 
u 
or 
v
  - if 
(e’) < 
(e)
       
if 
MO
(e’) = 
TRUE
 
then return
 
FALSE
 return 
TRUE
 
e
 
e’
 
5
 
8
Min VC: 
[NO]
 Algorithm 
(cont)
 
Main claim
 of 
[NO]:
 For a 
randomly 
selected 
, the
expected 
query complexity
 of oracle is 
2
O(d)
.
Analysis 
based on bounding (in expectation) the total
number of edges 
reached in recursive calls
 (sequence
of recursive calls corresponds to sequence of
decreasing
 
 
values).
Note:
 can select 
 
“on the fly”, 
by assigning each new
edge considered a 
random 
(discretizied)
 
value in 
[0,1].
MO
 
(input: edge
 e = {u,v}
, output:
 
is
 e 
in
 M
(G)
)
 For each edge 
e’ 
 e
 incident to 
u 
or 
v
  - if 
(e’) < 
(e)
       
if 
MO
(e’) = 
TRUE
 
then return
 
FALSE
 return 
TRUE
From exp(d) to poly(d)
 
[YYI]
 analyzed a 
variant
 suggested by 
[NO] 
and proved
that expected number of oracle calls for variant is
O(d
2
), 
resulting in 
poly(d,1/
) 
complexity
 
(and 
[ORRR]
further modified alg to get almost optimal complexity).
 
Same algs give (roughly) 
½
-
approx for 
maximum
matching.
[NO] 
and 
[YYI]
 build on alg and 
[Hopcroft&Karp] 
to get
(1-
)
-
approx 
using exp(d
O(1/
)
)  
(
d
O(1/
)
, resp.) queries
 
and approximate Maximum Matching
Distributed Connection Revisited
 
Previously observed that can implement 
VC-oracle
based on known 
distributed algorithm(s)
Now observe that oracle implementation of 
[NO] 
for
approx-max-match 
gives distributed algorithm 
that
performs 
d
O(1/
)
 
rounds and has high 
constant
 success
probability.
Compare to 
O(log(n)/
3
)
-
round
 alg of 
[Lotker, Patt-
Shamir,Pettie] 
that has success prob 
1-1/poly(n)
A different distributed algorithm, using ideas from
[NO] 
as well as distributed coloring alg 
[Linial],
performs 
d
O(1/
)
 + log*(n)/
2  
rounds, and succeeds
with prob 
1
 
[Even,Medina,R]
(Studied in context of 
Centralized Local Algorithms
)
Part III: Min Weight Spanning Tree
 
Consider graphs with degree bound
 d 
and weights in
{1,..,W}. 
Goal is to approx weight of 
MST
[Chazelle,Rubinfeld,Trevisan]
 give 
(1+
)-
approximation
alg using 
Õ(d
W/
2
)
 
neighbor queries
.
Result is 
tight 
and extends to 
d=d
avg
 
and weights in
[1,W].
 
Suppose first:
 W=2 
(i.e., weights either
 1 
or 
2
)
E
1
 
=
 
edges with weight
 1
,
 G
1
=(V,E
1
), c
1
 
= num of
connected components
 in
 G
1
.
 
Weight of 
MST
: 
2
(c
1
-1) + 1
(n-1-(c
1
-1)) = n-2+c
1
Estimate 
MST
 
weight by estimating
 c
1
MST (cont)
 
More generally (weights in 
{1,..,W}
)
E
i
 
=
 
edges with weight 
≤ i
,
 G
i
=(V,E
i
), c
i
 
= num of
connected components 
(
cc’s
) in
 G
i
.
 
Weight of 
MST
: 
n - W + 
i=1..W-1
 c
i
Estimate 
MST
 
weight by estimating
 c
1
,…,c
W-1
.
Idea
 
for estimating num of 
cc’s
 in graph
 H 
(
c(H)
):
For vertex 
v
, 
n
v
 = num of vertices in 
cc
 of 
v
.
Then:
     c(H) = 
v
(1/n
v
)
 
2
(1/2)
 
3
(1/3)
 
4
(1/4)
MST (cont)
 
c(H) = 
v
(1/n
v
)   (n
v
 = num of vertices in 
cc
 of 
v
)
Can estimate
 c(H) 
by: selecting sample 
R
 of vertices,
for each 
v 
in
 R, 
finding
 n
v
 
(using
 BFS
)
 
and taking
(normalized) sum over sample:
        
(n/|R|)
 
v
R
 
(1/n
v
)
Difficulty:
 
if 
n
v
 
is 
large
, then 
“expensive”
Let 
S = {v : n
v 
 B}. 
(
S 
for 
“small”
)
      
v
S
(1/n
v
) 
>  
c(H) – n/B
Alg for estimating
 c(H) 
selects
 
sample
 R 
of
 
vertices,
runs 
BFS
 
on each selected
 v 
in
 R 
until finds
 n
v
 
or
determines that
 n
v
 > B
 
(i.e.
 v 
S
).
Estimate 
c(H)
 by 
(n/|R|)
 
v
R
S
 
(1/n
v
)
Complexity:
 O(|R|
B
d)
MST (cont)
 
For any 
 < 1, if set B=2/
 and |R| = 
(1/
2
) 
get
estimate of 
C(H) 
to within 
n 
with complexity
 
O(d/
3
)
Alg for estimating
 
MST 
weight
 
can run above alg on
each
 G
i
  
with 
=
/W, 
so that when sum estimates of 
c
i
for
 
i=1,…,W 
get desired 
(1
)
 approximation.
 
(
G
i
=(V,E
i
), E
i
 
=
 
edges with weight 
≤ i)
 
(
c
i
 = num of connected components
 
in
 G
i
)
 
(Weight of 
MST
: 
n - W + 
i=1..W-1
 c
i
)
 
Comment:
 
[Chazelle,Rubinfeld,Trevisan]
 get better
complexity (total of 
Õ(d
W/
2
)
 ) by more refined alg
Summary
 
Talked about 
sublinear 
approximation algorithms for
various graph parameters:
 
I.
Average 
degree 
and number of
 stars 
and
 triangles
II.
Size of minimum 
vertex cover 
(
maximum matching
)
III.
 Weight of minimum weight 
spanning tree
 
There is a 
high-level 
connection to 
distributed
computing
 through 
sublinearity
, and there some are
concrete 
algorithmic connections
Thanks
From 
[NO]
 to 
[YYI]
 
1
0
 
7
 
6
 
2
 
3
 
4
 
e
 
1
 
[NO]
 suggested following 
“greedy”
 
variant
:
 
MO
 
(input: edge
 e = {u,v}
, output:
 
is
 e 
in
 M
(G)
)
 Query 
G 
on all edges incident to 
u 
and 
v
. Let 
e
1
,…,e
t
be ordering of edges according to 
 
(i.e.,
 
(e
j
)<
(e
j+1
) 
)
 j 
 1
 
while 
(e
j
)<
(e)
   
-
 
if 
MO
(e
j
) = 
TRUE
 
then return
 
FALSE
  
-
 
else
 j 
 j+1
 return 
TRUE
 
[YYI]
 analyzed variant and proved that expected number
of queries for variant is 
O(d
2
), 
resulting in algorithm
with 
query complexity
 O(d
4
/
2
) 
(extra factors of 
d
 due to
prob of selecting 
e
 in 
M
(G)
 being 
|M
(G)| / (d n)
 )
Part III(b): Maximum Matching and more
 
[Nguyen,Onak]
 give 
(1,
)-
approx for 
max match
 with
complexity 
2
d*O(1/
)
,
 improved 
[Yoshida,Yamamoto,Ito]
to 
d
6/
*2
(1/
)
O(1/
)
Recursive 
application of oracles using 
augmenting paths
.
[Hassidim,Kelner,Nguyen,Onak],[Elek] 
give 
(1,
)-
approx
algs on 
restricted 
graphs (e.g., 
planar
)
Can get 
(O(log d),
)-
approx for 
min dominating set
 with
complexity 
d
O(log d)
/
2
 using
[Kuhn,Moscibroda,Wattenhofer]
Part I(b): Number of stars subgraphs
 
N
s 
n
1+1/s
 
:      
O(n/(N
s
)
1/(1+s)
)
n
1+1/s 
N
s 
n
s
 :   
O(n
1-1/s
)
N
s
 > n
s
 :         
O(n
s-1/s
/(N
s
)
1-1/s
) = O((n
s+1
/N
s
)
1-1/s
)
 
Example:
 
s=3
. 
(a)
 
N
s
 = n
 : 
O(n
3/4
)
; 
(b)
 
N
s
 = n
2
 : 
O(n
2/3
);
(c)
 
N
s
 = n
4
 
: 
O(1)
 
Idea 
of algorithm for
 
s=2
: Also partition into
 buckets
.
 
Can estimate num of 
2
-stars with 
centers 
in 
large 
buckets.
Also estimate num of 
2
-stars with 
centers
 in
(“significant”) 
small
 buckets and at least one
endpoint
 in 
large
 bucket by estimating num of
edges between pairs of buckets.
Part IV: Approximating Distance to P
 
For graph property 
P
, estimate 
fraction of edges
 that
should be 
added/removed
 to obtain 
P
 (fraction with
respect to (ub on) num of edges 
m
).  Assume 
m=
(n).
Study of 
distance approximation
 first explicitly
introduced in
 
[Parnas,R,Rubinfeld]
.
Note:
 Already discussed alg for distance to
connectivity 
in sparse graphs
 
(estimate num of
 
cc
’s)
 
For 
dense 
graphs where 
m=
(n
2
)
 and perform 
vertex-
pair 
queries, some known 
testing
 results directly give
dist. approx.
 results: e.g., 
-
cut
 (having a cut of size
at least 
n
2
): (1,
)
-
approx using
 poly(1/
) 
queries
(
exp(poly(1/
)) 
time) – equiv to approx 
Max-Cut
.
[Fischer,Newman]:
 
all
 testable properties
 
(comp.
independent of 
n
) have dist. approx. algs. Direct
Analysis for 
monotone 
properties 
[Alon,Shapira,Sudakov]
Part IV: Approximating Distance to P
 
Dist. app. for 
sparse
 graphs studied in 
[Marko,R]
 
Property
 
Model
 
 
Complexity
 
distance w.r.t, 
d
n
 
cannot get sublin with 
=1
 
(n
1/2
)
for
sparse
model
 
Extends to 
subgraph-free
 
[Hassidim,Kelner,Nguyen,Onak] 
give 
(1,
)-
approx for
restricted graphs: e.g. dist. to
 
3
-col in
 planar
 graphs
.
Slide Note
Embed
Share

Centralized sublinear algorithms and their relation to distributed computing are explored, emphasizing the efficiency of algorithms in processing large inputs in sublinear time. Examples of sublinear algorithms for various objects are provided, along with the computation and approximation of graph parameters efficiently in centralized settings. The need for sublinear-time algorithms in handling very large inputs is discussed in the context of achieving approximate solutions with high success probability.

  • Sublinear Algorithms
  • Centralized Computing
  • Distributed Computing
  • Graph Parameters
  • Efficiency

Uploaded on Sep 26, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. On Centralized Sublinear Algorithms and Some Relations to Distributed Computing Dana Ron Tel-Aviv University ADGA, October 2015

  2. Efficient (Centralized) Algorithms Usually, when we say that an algorithm is efficient we mean that it runs in time polynomial in the input size n (e.g., size of an input string s1s2 sn, or number of vertices in a graph ). Naturally, we seek as small an exponent as possible, so that O(n2) is good, O(n3/2log3(n)) is better, and linear time O(n) is really great! But what if n is HUGE so that even linear time is prohibitive? Are there tasks we can perform super- efficiently in sub-linear time? Supposedly, we need linear time just to read the input without processing it at all. But what if we don t read the whole input but rather sample from it? s1s2 si sj sn

  3. Sublinear Algorithms Given query access to an object O, perform computation that is (approximately) correct with high (constant) probability, after performing as few queries as possible. Sublinearity is in a sense inherent in distributed computing ? ? ? ? ? To define task precisely, must specify: object, query access, desired computation and notion of approximation

  4. Examples The object can be an array of numbers and would like to decide whether it is sorted The object can be a function and would like to decide whether it is linear (corresponds to the Hadamard Code) The object can be an image and would like to decide whether it is a cat/is convex. The object can be a set of points and would like to approximate the cost of clustering into k clusters (according to some objective function) The object can be a graph and would like to approximate some graph parameter.

  5. Graph Parameters A Graph Parameter: a function that is defined on a graph G (undirected / directed, unweighted / weighted). For example: Average degree Number of subgraphs H in G Number of connected components Minimum size of a vertex cover Maximum size of a matching Number of edges that should be added to make graph k-connected (distance to k-connectivity) Minimum weight of a spanning tree

  6. Computing/Approximating Graph Parameters Efficiently For all parameters described in the previous slide, have efficient, i.e., polynomial-time algorithms for computing the parameter (possibly approximately). For some even linear-time. However, in some cases, when inputs are very large, we might want even more efficient algorithms: sublinear-time algorithms. Such algorithms do not even read the entire input, are randomized, and provide an approximate answer (with high success probability).

  7. Sublinear Approximation on Graphs Algorithm is given query access to G. Types of queries that consider: Neighbor queries who is ithneighbor of v? Degree queries what is deg(v)? Vertex-pair queries is there an edge btwn u and v? After performing number of queries that is sublinear in size of G, should output good approximation of (G), with high constant probability (w.h.c.p.). Types of approximation that consider: (G) (1+ ) (G) (for given : a (1+ )-approx.) (G) (G) (for fixed : an -approx.) (G) (G) + n (for fixed and given where n is size of range of (G) : an ( , )-approx.) ? + weight of edge

  8. Survey Results in 3 Parts I. Average degree and number of subgraphs II. Size of minimum vertex cover and maximum matching III. Minimum weight of spanning tree

  9. Part I: Average Degree Let davg= davg(G) denote average degree in G, davg 1 (can compute exactly in linear time) Observe: approximating average of general function with range {0,..,n-1} (degrees range) requires (n) queries, so must exploit non-generality of degrees Can obtain (2+ )-approximation of davgby performing O(n1/2/ ) degree queries [Feige]. Going below 2: (n) queries [Feige]. With degree and neighbor queries, can obtain (1+ )- approximation by performing (n1/2poly(1/ )) queries [Goldreich,R]. Comment1: In both cases, can replace n1/2with (n/davg)1/2 Comment2: In both cases, results are tight (in terms of dependence on n/davg).

  10. Average Degree (cont) Ingredient 1: Consider partition of all graph vertices into r=O((log n)/ ) buckets: In bucket Bivertices v s.t. (1+ )i-1 < deg(v) (1+ )i Suppose can obtain for each i estimate bi=|Bi| (1 (1/n) i bi (1+ )i= (1 How to obtain bi? By sampling (and applying [Chernoff]). Difficulty: if Biis small (<< n1/2) then necessary sample is too large ((|Bi|/n)-1>> n1/2). ( = /8 ) ) ) ) davg (*) Ingredient 2: Ignore small Bi s. Take sum in (*) only over large buckets (|Bi| > ( n)1/2/2r). (1/n) large i bi (1+ )i davg/(2+ ) Claim: (**)

  11. Average Degree (cont) Claim: (1/n) large i bi (1+ )i davg/(2+ ) (**) Sum of degrees = 2 num of edges (small: |Bi| ( n)1/2/2r, r : num of buckets) small buckets large buckets counted twice not counted counted once Using (**) get (2+ )-approximation with (n1/2/ 2) degree queries Ingredient 3: Estimate num of edges counted once and compensate for them.

  12. Average Degree (cont) Ingredient 3: Estimate num of edges counted once and compensate for them. large buckets small buckets Bi For each large Biestimate num of edges between Biand small buckets by sampling neighbors of (random) vertices in Bi. By adding this estimate eito (**) get (1+ )-approx. (1/n) large i bi (1+ )i (1/n) large i (bi (1+ )i + ei) (**)

  13. Part I(b): Number of subgraphs - Stars Approximating avg. degree same as approximating num of edges. What about other subgraphs? (Also known as counting network motifs.) [Gonen,R,Shavitt] considered length-2 paths, and more generally, k-stars. (avg deg + 2-stars gives variance, larger k higher moments) Let sk= sk(G) denote num of k-stars. Give (1+ )-approx algorithm with query complexity (degree+neighbors): / 1 k k n k n / 1 + 1 k min , (log / 1 , n ) O n poly ) 1 + / 1 1 k /( 1 k k s s Show that this upper bound is tight.

  14. Part I(c): Number of subgraphs - Triangles Counting number of triangles t=t(G) exactly/approximately studied quite extensively in the past. All previous algorithms read entire graph. [Gonen,R,Shavitt] showed that using only degree and neighbor queries there is no sublinear algorithm for approximately counting num of triangles. Natural question: What if also allow vertex-pair queries? [Eden,Levi,R,Seshadhri] Give (1+ )-approx algorithm with query complexity (degree+neighbors+vertex-pairs): O(n/t1/3+ m3/2/t) poly(log n,1/ ) and give matching lower bound.

  15. Part II: The Minimum Vertex Cover (VC) Recall: For a graph G = (V,E), a vertex cover of G is a subset C V s.t. for every {u,v} E, {u,v} C Computing the size of a minimum vertex cover is NP-hard, but a factor-2 approx can be found in linear time [Gavril], [Yanakakis] Can we get approx even more efficiently, i.e., in sublinear-time?

  16. Min VC (Imaginary) Oracle Initially considered in [Parnas,R]. First basic idea: Suppose had oracle that for given vertex v answers if v C for some fixed vertex cover C that is at most factor larger than min size of vertex cover, vc(G). v? https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcRg3OlsBXkJV7KxcM7lkx_KA6s98HW2aRfDi7bC2f4iQHQIjwNt Consider uniformly and independently sampling s= (1/ 2) vertices, and querying oracle on each sampled vertex. v C ! v C ! Let svcbe number of sampled vertices that answers: in C. By Chernoff, w.h.c.p |C|/n- /2 svc/s |C|/n+ /2 Since vc(G) |C| vc(G), if define vc = (svc/s + /2)n then (w.h.c.p) vc(G) vc vc(G) + n That is, get ( , )-approximation

  17. Min VC: Distributed connection Second idea: Can use distributed algorithm to implement oracle. Suppose have dist. alg. (in message passing model) that works in k rounds of communication. Then oracle, when called on vertex v, will emulate dist. alg. on k-distance-neighborhood of v https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcRg3OlsBXkJV7KxcM7lkx_KA6s98HW2aRfDi7bC2f4iQHQIjwNt v v C ! v? Query complexity of each oracle call: O(dk) where d is max degree in the graph.

  18. Min VC - Results By applying dist. alg. of [Kuhn,Moscibroda,Wattenhofer] get (c, )-approx. (c>2) with complexity dO(log d)/ 2, and (2, )-approx. with complexity dO(d)poly(1/ ). Comment 1: Can replace max deg d with davg/ [PR] Comment 2: Going below 2 : (n1/2) queries (Trevisan) 7/6: (n) [Bogdanov,Obata,Trevisan] Comment 3: Any (c, )-approximation: (davg) queries [PR] Sequence of improvements for (2, )-approx [Marko,R]: dO(log(d/ )) - using dist. alg. similar to max ind. set alg of [Luby] [Nguyen,Onak]: 2O(d)/ 2 emulate classic greedy algorithm (maximal matching) [Gavril],[Yanakakis] [Yoshida,Yamamoto,Ito]: O(d4/ 2) sophisticated analysis of better emulation [Onak,R,Rosen,Rubinfeld]: (davgpoly(1/ ))

  19. Min VC: [NO] Algorithm Factor-2 approx alg of [Gavril],[Yanakakis] (not sublin) C , U E (C: vertex cover, U: uncovered edges, ) while U - select (arbitrarily) e = {u,v} in U - C C {u,v} (add both endpoints of e to cover) - U U \ ({ {u,w} E } { {v,w} E } ) (remove from U all edges covered by u or v) Slightly different description of alg: C , U E, For j = 1 to |E| - If j= {u,v} in U: C C {u,v} U U \ ({ {u,w} E } { {v,w} E } ) arbitrary permutation of E ,M 5 2 7 3 8 4 1 6 , M M { {u,v} }

  20. Min VC: [NO] Algorithm (cont) Let M (G) be maximal matching resulting from alg when using permutation (so that vc(G)/2 |M (G)| vc(G) ) Can estimate vc(G) by estimating |M (G)|: sample (1/ 2) edges uniformly; for each check if in M (G) by calling maximal matching oracle. Basic observation: For an edge e = {u,v}, if for some e = {u,w} (or {v,w}), (e ) < (e) and e in M (G) then e not in M (G). Otherwise, e in M (G) . MO (input: edge e = {u,v}, output: is e in M (G)) For each edge e e incident to u or v - if (e ) < (e) if MO (e ) = TRUE then return FALSE return TRUE 5 8 e e

  21. Min VC: [NO] Algorithm (cont) MO (input: edge e = {u,v}, output: is e in M (G)) For each edge e e incident to u or v - if (e ) < (e) if MO (e ) = TRUE then return FALSE return TRUE Main claim of [NO]: For a randomly selected , the expected query complexity of oracle is 2O(d). Analysis based on bounding (in expectation) the total number of edges reached in recursive calls (sequence of recursive calls corresponds to sequence of decreasing values). Note: can select on the fly , by assigning each new edge considered a random (discretizied) value in [0,1].

  22. From exp(d) to poly(d) and approximate Maximum Matching [YYI] analyzed a variant suggested by [NO] and proved that expected number of oracle calls for variant is O(d2), resulting in poly(d,1/ ) complexity (and [ORRR] further modified alg to get almost optimal complexity). Same algs give (roughly) -approx for maximum matching. [NO] and [YYI] build on alg and [Hopcroft&Karp] to get (1- )-approx using exp(dO(1/ )) (dO(1/ ), resp.) queries

  23. Distributed Connection Revisited Previously observed that can implement VC-oracle based on known distributed algorithm(s) Now observe that oracle implementation of [NO] for approx-max-match gives distributed algorithm that performs dO(1/ )rounds and has high constant success probability. Compare to O(log(n)/ 3)-round alg of [Lotker, Patt- Shamir,Pettie] that has success prob 1-1/poly(n) A different distributed algorithm, using ideas from [NO] as well as distributed coloring alg [Linial], performs dO(1/ )+ log*(n)/ 2 rounds, and succeeds with prob 1 [Even,Medina,R] (Studied in context of Centralized Local Algorithms)

  24. Part III: Min Weight Spanning Tree Consider graphs with degree bound d and weights in {1,..,W}. Goal is to approx weight of MST [Chazelle,Rubinfeld,Trevisan] give (1+ )-approximation alg using (d W/ 2) neighbor queries. Result is tight and extends to d=davgand weights in [1,W]. Suppose first: W=2 (i.e., weights either 1 or 2) E1= edges with weight 1, G1=(V,E1), c1= num of connected components in G1. Weight of MST: 2 (c1-1) + 1 (n-1-(c1-1)) = n-2+c1 Estimate MST weight by estimating c1

  25. MST (cont) More generally (weights in {1,..,W}) Ei= edges with weight i, Gi=(V,Ei), ci= num of connected components (cc s) in Gi. Weight of MST: n - W + i=1..W-1ci Estimate MST weight by estimating c1, ,cW-1. Idea for estimating num of cc s in graph H (c(H)): For vertex v, nv= num of vertices in cc of v. Then: c(H) = v(1/nv) 3 (1/3) 4 (1/4) 2 (1/2)

  26. MST (cont) c(H) = v(1/nv) (nv= num of vertices in cc of v) Can estimate c(H) by: selecting sample R of vertices, for each v in R, finding nv(using BFS) and taking (normalized) sum over sample: (n/|R|) v R(1/nv) Difficulty: if nvis large, then expensive Let S = {v : nv B}. (S for small ) v S(1/nv) > c(H) n/B Alg for estimating c(H) selects sample R of vertices, runs BFS on each selected v in R until finds nvor determines that nv> B (i.e. v S). Estimate c(H) by (n/|R|) v R S(1/nv) Complexity: O(|R| B d) v

  27. MST (cont) For any < 1, if set B=2/ and |R| = (1/ 2) get estimate of C(H) to within n with complexity O(d/ 3) Alg for estimating MST weight can run above alg on each Gi with = /W, so that when sum estimates of ci for i=1, ,W get desired (1 ) approximation. (Gi=(V,Ei), Ei= edges with weight i) (ci= num of connected components in Gi) (Weight of MST: n - W + i=1..W-1ci) Comment: [Chazelle,Rubinfeld,Trevisan] get better complexity (total of (d W/ 2) ) by more refined alg

  28. Summary Talked about sublinear approximation algorithms for various graph parameters: I. Average degree and number of stars and triangles II. Size of minimum vertex cover (maximum matching) III. Weight of minimum weight spanning tree There is a high-level connection to distributed computing through sublinearity, and there some are concrete algorithmic connections

  29. Thanks

  30. From [NO] to [YYI] [NO] suggested following greedy variant: MO (input: edge e = {u,v}, output: is e in M (G)) Query G on all edges incident to u and v. Let e1, ,et be ordering of edges according to (i.e., (ej)< (ej+1) ) j 1 while (ej)< (e) - if MO (ej) = TRUE then return FALSE - else j j+1 return TRUE 3 4 10 7 e 2 1 6 [YYI] analyzed variant and proved that expected number of queries for variant is O(d2), resulting in algorithm with query complexity O(d4/ 2) (extra factors of d due to prob of selecting e in M (G) being |M (G)| / (d n) )

  31. Part III(b): Maximum Matching and more [Nguyen,Onak] give (1, )-approx for max match with complexity 2d*O(1/ ), improved [Yoshida,Yamamoto,Ito] to d6/ *2(1/ )O(1/ ) Recursive application of oracles using augmenting paths. [Hassidim,Kelner,Nguyen,Onak],[Elek] give (1, )-approx algs on restricted graphs (e.g., planar) Can get (O(log d), )-approx for min dominating set with complexity dO(log d)/ 2using [Kuhn,Moscibroda,Wattenhofer]

  32. Part I(b): Number of stars subgraphs n n N s Ns n1+1/s: O(n/(Ns)1/(1+s)) n1+1/s Ns ns: O(n1-1/s) Ns> ns: O(ns-1/s/(Ns)1-1/s) = O((ns+1/Ns)1-1/s) Example: s=3. (a) Ns= n : O(n3/4); (b) Ns= n2: O(n2/3); (c) Ns= n4: O(1) Idea of algorithm for s=2: Also partition into buckets. Can estimate num of 2-stars with centers in large buckets. Also estimate num of 2-stars with centers in ( significant ) small buckets and at least one endpoint in large bucket by estimating num of edges between pairs of buckets. 1 / s s n + 1 1 / s min , (log 1 , n / ) O poly + 1 /( 1 ) 1 s 1 / s s N + i 1 ( ) ib 2

  33. Part IV: Approximating Distance to P For graph property P, estimate fraction of edges that should be added/removed to obtain P (fraction with respect to (ub on) num of edges m). Assume m= (n). Study of distance approximation first explicitly introduced in [Parnas,R,Rubinfeld]. Note: Already discussed alg for distance to connectivity in sparse graphs (estimate num of cc s) For dense graphs where m= (n2) and perform vertex- pair queries, some known testing results directly give dist. approx. results: e.g., -cut (having a cut of size at least n2): (1, )-approx using poly(1/ ) queries (exp(poly(1/ )) time) equiv to approx Max-Cut. [Fischer,Newman]: all testable properties (comp. independent of n) have dist. approx. algs. Direct Analysis for monotone properties [Alon,Shapira,Sudakov]

  34. Part IV: Approximating Distance to P Dist. app. for sparse graphs studied in [Marko,R] distance w.r.t, d n Property Model Complexity k-Edge- Connectivity Triangle- Freeness Eulerian Cycle- Freeness sparse 1 poly(k/( davg)) bounded- degree sparse bounded- degree 3 dO(log(d/ )) (n1/2) for sparse model 1 1 O(1/( davg)4) O(1/ 3) Extends to subgraph-free [Hassidim,Kelner,Nguyen,Onak] give (1, )-approx for restricted graphs: e.g. dist. to 3-col in planar graphs. cannot get sublin with =1

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#