Sublinear Algorithms and Graph Parameters in Centralized and Distributed Computing

On Centralized Sublinear

Algorithms and Some Relations to

Distributed Computing

Efficient (Centralized) Algorithms

Usually, when we say that an algorithm is

efficient

we mean that it runs in time

polynomial

 in the input

size

(e.g., size of an input string

…s

or number of

vertices

 in a graph ).

Naturally, we seek as

small an exponent

as

possible, so that

O(n

is

good

O(n

3/2

log

(n))

is

better

, and linear time

O(n)

is

really great!

But what if

is

HUGE

 so that even linear time is

prohibitive? Are there tasks we can perform

“super-

efficiently”

in

sub-linear

 time?

Supposedly, we need linear time just to

read

 the input without processing it at all.

But what if we don’t read the whole input but rather

sample

from it?

…s

…s

…s

Sublinear Algorithms

Given

query access

to an object

, perform

computation that is (

approximately

correct

 with

high (constant) probability,

after performing as

few queries

as possible.

To define task

precisely

, must specify:

object

query

access, desired

computation

 and notion

of

approximation

Sublinearity

is in a sense

inherent

in

distributed

computing

Examples

•

The object can be an

array

 of numbers and would

like to decide whether it is

sorted

•

The object can be a

function

 and would like to

decide whether it is

linear

corresponds to the

Hadamard Code

•

The object can be

an image

and would like to

decide whether

it is a cat

is convex.

•

The object can be

a set of points

and would like

to approximate the cost of

clustering into k

clusters

(according to some objective function)

•

The object can be a

 graph

and would like to

approximate some

graph parameter

Graph Parameters

Graph Parameter:

a function



that is defined

on a graph

 (undirected / directed, unweighted /

weighted).

For example:

•

 Average

degree

•

 Number of

subgraphs

in

•

 Number of

connected components

•

 Minimum size of a

vertex cover

•

 Maximum size of a

matching

•

Number of edges that should be added to make

graph

-connected

distance

to

 k-connectivity

•

 Minimum weight of a

spanning tree

Computing/Approximating

Graph Parameters Efficiently

For

all

parameters

described in the previous slide,

have

efficient

, i.e.,

polynomial-time

algorithms for

computing

 the parameter

(possibly

 approximately

). For some

even

linear-time

However, in some cases, when inputs are

very

large

, we might want

even more efficient

algorithms:

sublinear-time

 algorithms.

Such algorithms do

not even read the entire input

are

randomized

, and provide an

approximate

answer (with high success probability).

Sublinear Approximation on Graphs

Algorithm is given

query access

to

G.

Types of

 queries

 that consider:

•

Neighbor

 queries – “who is

th

neighbor of

?”

•

Degree

queries – “what is

deg(v

)?”

•

Vertex-pair

 queries – “is there an edge

btwn

and

?”

After performing number of queries that is

sublinear

in size of

, should output

good approximation



’

of



(G),

with

high constant probability

w.h.c.p.

).

Types of

 approximation

that consider:

•



(G)

≤



’

≤ (1+







(G)

(for given



(1+



)-

approx

•



(G)

≤



’

≤





(G)

(for fixed



an



approx

•



(G)

≤



’

≤





(G) +



(for fixed



and

given



where

is size of range of



(G)

: an





)-

approx

weight

 of edge

Survey Results in 3 Parts

I.

Average

degree

and number of

 subgraphs

II.

 Size of minimum

vertex cover

and maximum

matching

III.

 Minimum weight of

spanning tree

Part I: Average Degree

Let

avg

avg

(G)

 denote average degree in

G, d

avg



(can compute exactly in linear time)

Observe:

approximating average of

general function

with

range

 {0,..,n-1}

(degrees range) requires



(n)

 queries, so

must exploit

non-generality

 of degrees

Can obtain

(2+



)-

approximation of

avg

by performing

O(n

1/2



degree

 queries

[Feige]

Going below



(n)

 queries

[Feige]

With

degree

and

neighbor

 queries, can obtain

(1+



)-

approximation by performing

Õ(n

1/2

 poly(1/



))

 queries

[Goldreich,R]

Comment1:

 In both cases, can replace

1/2

 with

(n/d

avg

1/2

Comment2:

 In both cases, results are

tight

 (in terms of

dependence on

n/d

avg

Average Degree (cont)

Ingredient 1:

Consider partition of all graph vertices

into

r=

O((log n)/



buckets:

 In bucket

vertices

s.t

(1+



i-1

< deg(v) ≤

(1+







Claim:

(1/n)





large



(1+





avg

/(2+



(**)

Ingredient 2:

Ignore

small

’s. Take sum in

(*)

only

over

large

 buckets (

|B

| > (



n)

1/2

/2r

).

Suppose can obtain for each

 estimate

=|B



(1



          (1/n)







(1+



(1





avg

(*)

How to obtain

? By

sampling

(and applying

[Chernoff]

Difficulty:

if

 is small

(<< n

1/2

) then necessary sample

is too large

((|B

|/n)

-1

 >> n

1/2

).

Average Degree (cont)

Claim:

(1/n)





large



(1+





avg

/(2+



)       (**)

Sum of degrees =



 num of edges

small

buckets

large

buckets

Using

(**)

get

(2+



-approximation with

Õ(n

1/2



degree

 queries

small:

|B

≤



n)

1/2

/2r

: num of buckets)

Ingredient 3:

Estimate num of edges counted

once

and

compensate

 for them.

Average Degree (cont)

small

buckets

large

buckets

Ingredient 3:

Estimate num of edges counted

once

and

compensate

 for them.

For each

large

 estimate num of edges between

and

small

buckets by

sampling neighbors

 of (random)

vertices in

By adding this estimate

to

(**)

get

(1+



)-

approx.

(1/n)





large



(1+



(**)









Part I(b): Number of subgraphs - Stars

Approximating avg. degree same as approximating num

of

edges

. What about other

subgraphs?

(Also known as

counting

network motifs.

[Gonen,R,Shavitt]

 considered

length-

 paths

, and more

generally,

-stars

(avg deg +

-stars gives

variance

, larger

– higher

moments

Let

= s

(G)

 denote num of

-stars. Give

(1+



)-

approx

algorithm with

query

complexity (degree+neighbors):

Show that this upper bound is

tight

Part I(c): Number of subgraphs - Triangles

Counting number of

triangles

t=t(G)

exactly/approximately studied quite extensively in the

past. All previous algorithms

read entire graph

[Gonen,R,Shavitt]

 showed that using only

degree and

neighbor queries

 there is

no sublinear algorithm

for

approximately counting num of triangles.

Natural question: What if also allow

vertex-pair queries

[Eden,Levi,R,Seshadhri]

Give

(1+



)-

approx algorithm with

query

complexity (

degree

neighbors

vertex-pairs

):

O(n/t

1/3

+ m

3/2

/t) poly(log n,1/



and give

matching lower bound

Part II:

The Minimum Vertex Cover (VC)

Recall:

 For a graph

G = (V,E)

, a

vertex cover

of

is a

subset



 s.t. for every

{u,v}



{u,v}







Computing the size of a minimum vertex cover

is

NP-hard

, but a

factor-2 approx

 can be

found in

linear time

[Gavril], [Yanakakis]

Can we get approx even

more

 efficiently, i.e., in

sublinear-time?

Min VC – (Imaginary) Oracle

Initially considered in

[Parnas,R].

First basic idea

: Suppose had

oracle

that for

given vertex

answers if



 for some

fixed

vertex cover

 that is at most factor



larger than

min size

of vertex cover,

vc(G)

Consider uniformly and independently

sampling

s=



(1/



vertices, and

querying

oracle

on each sampled vertex.

Let

vc

 be number of sampled vertices that answers:

in

By

Chernoff

, w.h.c.p

|C|/n-



/2



vc

/s



|C|/n+



/2

Since

vc(G)



|C|





 vc(G),

if define

vc’=

(s

vc

/s +



/2)n

then (w.h.c.p)

vc(G)



 vc’





 vc(G) +



That is, get





)-

approximation



C !



C !

Min VC: Distributed connection

Second idea:

 Can use

distributed algorithm

 to implement

oracle.

Suppose have dist. alg. (in message passing model) that

works in

rounds of communication

Then oracle, when called on vertex

, will

emulate

dist.

alg. on

-distance-neighborhood

of

Query complexity

 of each oracle call:

O(d

where

is

max degree in the graph.



C !

Min VC - Results

By applying dist. alg. of

[Kuhn,Moscibroda,Wattenhofer]

get

(c,



)-

approx. (

c>2

) with complexity

O(log d)



, and

(2,



-approx. with complexity

O(d)poly(1/



Comment 1:

Can replace max deg

with

avg



[PR]

Comment 2:

 Going below



(n

1/2

 queries

(Trevisan)

7/6



(n)

[Bogdanov,Obata,Trevisan]

Comment 3:

Any

(c,



-approximation:



(d

avg

 queries

[PR]

Sequence of improvements for

(2,



-approx

[Marko,R]

O(log(d/



))

- using dist. alg. similar to

max ind. set

 alg of

[Luby]

[Nguyen,Onak]

O(d)



 – emulate classic

greedy algorithm

(maximal matching)

[Gavril],[Yanakakis]

[Yoshida,Yamamoto,Ito]

O(d



– sophisticated analysis

of better emulation

[Onak,R,Rosen,Rubinfeld]

Õ(d

avg

poly(1/



))

Min VC:

[NO]

 Algorithm

Factor-2

 approx

alg

of

[Gavril],[Yanakakis]

not

 sublin)

•







: vertex cover,

: uncovered edges,

•

while





select (arbitrarily)

 e = {u,v}

in





 {u,v}

add both endpoints of

to cover



U \ ({ {u,w}



E }



 { {v,w}



 E } )

remove from

all edges covered by

or

Slightly different description of alg:

•











arbitrary

 permutation

of

•

For

j = 1 to |E|

If



 = {u,v}

in

U:





 {u,v}



U \ ({ {u,w}



E }



 { {v,w}



 E } )







 { {u,v} }

Min VC:

[NO]

 Algorithm

(cont)

Let



(G)

be

maximal matching

 resulting from alg when

using permutation



 (so that

vc(G)/2



|M



(G)|



 vc(G)

Can estimate

vc(G)

 by estimating

|M



(G)|:

sample



(1/



edges

 uniformly; for each check if in



(G)

 by calling

maximal matching

 oracle.

Basic observation:

 For an edge

e = {u,v}

, if for some

e’ = {u,w}

(or

{v,w}

),



(e’) <



(e)

and

e’

in



(G)

then

not in



(G).

Otherwise,

in



(G) .

MO



(input: edge

 e = {u,v}

, output:

is

in



(G)

•

 For each edge

e’



 incident to

or

  - if



(e’) <



(e)

if

MO



(e’) =

TRUE

then return

FALSE

•

 return

TRUE

e’

Min VC:

[NO]

 Algorithm

(cont)

Main claim

of

[NO]:

 For a

randomly

selected



, the

expected

query complexity

 of oracle is

O(d)

Analysis

based on bounding (in expectation) the total

number of edges

reached in recursive calls

 (sequence

of recursive calls corresponds to sequence of

decreasing



values).

Note:

 can select



“on the fly”,

by assigning each new

edge considered a

random

(discretizied)

value in

[0,1].

MO



(input: edge

 e = {u,v}

, output:

is

in



(G)

•

 For each edge

e’



 incident to

or

  - if



(e’) <



(e)

if

MO



(e’) =

TRUE

then return

FALSE

•

 return

TRUE

From exp(d) to poly(d)

[YYI]

 analyzed a

variant

 suggested by

[NO]

and proved

that expected number of oracle calls for variant is

O(d

),

resulting in

poly(d,1/



complexity

(and

[ORRR]

further modified alg to get almost optimal complexity).

Same algs give (roughly)

½

approx for

maximum

matching.

[NO]

and

[YYI]

 build on alg and

[Hopcroft&Karp]

to get

(1-



approx

using exp(d

O(1/



O(1/



, resp.) queries

and approximate Maximum Matching

Distributed Connection Revisited

Previously observed that can implement

VC-oracle

based on known

distributed algorithm(s)

Now observe that oracle implementation of

[NO]

for

approx-max-match

gives distributed algorithm

that

performs

O(1/



rounds and has high

constant

 success

probability.

Compare to

O(log(n)/



round

 alg of

[Lotker, Patt-

Shamir,Pettie]

that has success prob

1-1/poly(n)

A different distributed algorithm, using ideas from

[NO]

as well as distributed coloring alg

[Linial],

performs

O(1/



 + log*(n)/



rounds, and succeeds

with prob

[Even,Medina,R]

(Studied in context of

Centralized Local Algorithms

Part III: Min Weight Spanning Tree

Consider graphs with degree bound

and weights in

{1,..,W}.

Goal is to approx weight of

MST

[Chazelle,Rubinfeld,Trevisan]

 give

(1+



)-

approximation

alg using

Õ(d



W/



neighbor queries

Result is

tight

and extends to

d=d

avg

and weights in

[1,W].

Suppose first:

W=2

(i.e., weights either

or

edges with weight

=(V,E

), c

= num of

connected components

in

Weight of

MST



(c

-1) + 1



(n-1-(c

-1)) = n-2+c

Estimate

MST

weight by estimating

MST (cont)

More generally (weights in

{1,..,W}

edges with weight

≤ i

=(V,E

), c

= num of

connected components

cc’s

) in

Weight of

MST

n - W +



i=1..W-1

Estimate

MST

weight by estimating

,…,c

W-1

Idea

for estimating num of

cc’s

 in graph

c(H)

):

For vertex

 = num of vertices in

cc

of

Then:

     c(H) =



(1/n



(1/2)



(1/3)



(1/4)

MST (cont)

c(H) =



(1/n

)   (n

 = num of vertices in

cc

of

Can estimate

 c(H)

by: selecting sample

 of vertices,

for each

in

R,

finding

(using

BFS

and taking

(normalized) sum over sample:

(n/|R|)





(1/n

Difficulty:

if

is

large

, then

“expensive”

Let

S = {v : n

≤

B}.

for

“small”





(1/n

c(H) – n/B

Alg for estimating

 c(H)

selects

sample

of

vertices,

runs

BFS

on each selected

in

until finds

or

determines that

> B

(i.e.



).

Estimate

c(H)

by

(n/|R|)







(1/n

Complexity:

 O(|R|





d)

MST (cont)

For any



 < 1, if set B=2/



 and |R| =



(1/



get

estimate of

C(H)

to within



with complexity

O(d/



Alg for estimating

MST

weight

can run above alg on

each

with





/W,

so that when sum estimates of

for

i=1,…,W

get desired

(1





 approximation.

=(V,E

), E

edges with weight

≤ i)

 = num of connected components

in

(Weight of

MST

n - W +



i=1..W-1

Comment:

[Chazelle,Rubinfeld,Trevisan]

 get better

complexity (total of

Õ(d



W/



 ) by more refined alg

Summary

Talked about

sublinear

approximation algorithms for

various graph parameters:

I.

Average

degree

and number of

 stars

and

 triangles

II.

Size of minimum

vertex cover

maximum matching

III.

 Weight of minimum weight

spanning tree

There is a

high-level

connection to

distributed

computing

 through

sublinearity

, and there some are

concrete

algorithmic connections

Thanks

From

[NO]

to

[YYI]

[NO]

 suggested following

“greedy”

variant

MO



(input: edge

 e = {u,v}

, output:

is

in



(G)

•

 Query

on all edges incident to

and

. Let

,…,e

be ordering of edges according to



(i.e.,



(e

)<



(e

j+1

•



•

while



(e

)<



(e)

if

MO



(e

) =

TRUE

then return

FALSE

else



j+1

•

 return

TRUE

[YYI]

 analyzed variant and proved that expected number

of queries for variant is

O(d

),

resulting in algorithm

with

query complexity

O(d



(extra factors of

 due to

prob of selecting

in



(G)

 being

|M



(G)| / (d n)

Part III(b): Maximum Matching and more

[Nguyen,Onak]

 give

(1,



)-

approx for

max match

 with

complexity

d*O(1/



 improved

[Yoshida,Yamamoto,Ito]

to

6/



*2

(1/



O(1/



Recursive

application of oracles using

augmenting paths

[Hassidim,Kelner,Nguyen,Onak],[Elek]

give

(1,



)-

approx

algs on

restricted

graphs (e.g.,

planar

Can get

(O(log d),



)-

approx for

min dominating set

 with

complexity

O(log d)



 using

[Kuhn,Moscibroda,Wattenhofer]

Part I(b): Number of stars subgraphs

≤

1+1/s

O(n/(N

1/(1+s)

1+1/s

≤

≤

O(n

1-1/s

> n

O(n

s-1/s

/(N

1-1/s

) = O((n

s+1

/N

1-1/s

Example:

s=3

(a)

= n

O(n

3/4

(b)

= n

O(n

2/3

);

(c)

= n

O(1)

Idea

of algorithm for

s=2

: Also partition into

 buckets

Can estimate num of

-stars with

centers

in

large

buckets.

Also estimate num of

-stars with

centers

in

(“significant”)

small

 buckets and at least one

endpoint

in

large

 bucket by estimating num of

edges between pairs of buckets.

Part IV: Approximating Distance to P

For graph property

, estimate

fraction of edges

 that

should be

added/removed

 to obtain

 (fraction with

respect to (ub on) num of edges

).  Assume

m=



(n).

Study of

distance approximation

 first explicitly

introduced in

[Parnas,R,Rubinfeld]

Note:

 Already discussed alg for distance to

connectivity

in sparse graphs

(estimate num of

cc

’s)

For

dense

graphs where

m=



(n

 and perform

vertex-

pair

queries, some known

testing

 results directly give

dist. approx.

 results: e.g.,



cut

 (having a cut of size

at least



): (1,



approx using

 poly(1/



queries

exp(poly(1/



))

time) – equiv to approx

Max-Cut

[Fischer,Newman]:

all

 testable properties

(comp.

independent of

) have dist. approx. algs. Direct

Analysis for

monotone

properties

[Alon,Shapira,Sudakov]

Part IV: Approximating Distance to P

Dist. app. for

sparse

 graphs studied in

[Marko,R]

Property

Model



Complexity

distance w.r.t,



cannot get sublin with



=1



(n

1/2

for

sparse

model

Extends to

subgraph-free

[Hassidim,Kelner,Nguyen,Onak]

give

(1,



)-

approx for

restricted graphs: e.g. dist. to

-col in

 planar

 graphs

Slide Note

Embed Share

Download

Centralized sublinear algorithms and their relation to distributed computing are explored, emphasizing the efficiency of algorithms in processing large inputs in sublinear time. Examples of sublinear algorithms for various objects are provided, along with the computation and approximation of graph parameters efficiently in centralized settings. The need for sublinear-time algorithms in handling very large inputs is discussed in the context of achieving approximate solutions with high success probability.

rone376 Follow

Uploaded on Sep 26, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

On Centralized Sublinear Algorithms and Some Relations to Distributed Computing Dana Ron Tel-Aviv University ADGA, October 2015

Efficient (Centralized) Algorithms Usually, when we say that an algorithm is efficient we mean that it runs in time polynomial in the input size n (e.g., size of an input string s1s2 sn, or number of vertices in a graph ). Naturally, we seek as small an exponent as possible, so that O(n2) is good, O(n3/2log3(n)) is better, and linear time O(n) is really great! But what if n is HUGE so that even linear time is prohibitive? Are there tasks we can perform super- efficiently in sub-linear time? Supposedly, we need linear time just to read the input without processing it at all. But what if we don t read the whole input but rather sample from it? s1s2 si sj sn

Sublinear Algorithms Given query access to an object O, perform computation that is (approximately) correct with high (constant) probability, after performing as few queries as possible. Sublinearity is in a sense inherent in distributed computing ? ? ? ? ? To define task precisely, must specify: object, query access, desired computation and notion of approximation

Examples The object can be an array of numbers and would like to decide whether it is sorted The object can be a function and would like to decide whether it is linear (corresponds to the Hadamard Code) The object can be an image and would like to decide whether it is a cat/is convex. The object can be a set of points and would like to approximate the cost of clustering into k clusters (according to some objective function) The object can be a graph and would like to approximate some graph parameter.

Graph Parameters A Graph Parameter: a function that is defined on a graph G (undirected / directed, unweighted / weighted). For example: Average degree Number of subgraphs H in G Number of connected components Minimum size of a vertex cover Maximum size of a matching Number of edges that should be added to make graph k-connected (distance to k-connectivity) Minimum weight of a spanning tree

Computing/Approximating Graph Parameters Efficiently For all parameters described in the previous slide, have efficient, i.e., polynomial-time algorithms for computing the parameter (possibly approximately). For some even linear-time. However, in some cases, when inputs are very large, we might want even more efficient algorithms: sublinear-time algorithms. Such algorithms do not even read the entire input, are randomized, and provide an approximate answer (with high success probability).

Sublinear Approximation on Graphs Algorithm is given query access to G. Types of queries that consider: Neighbor queries who is ithneighbor of v? Degree queries what is deg(v)? Vertex-pair queries is there an edge btwn u and v? After performing number of queries that is sublinear in size of G, should output good approximation of (G), with high constant probability (w.h.c.p.). Types of approximation that consider: (G) (1+ ) (G) (for given : a (1+ )-approx.) (G) (G) (for fixed : an -approx.) (G) (G) + n (for fixed and given where n is size of range of (G) : an ( , )-approx.) ? + weight of edge

Survey Results in 3 Parts I. Average degree and number of subgraphs II. Size of minimum vertex cover and maximum matching III. Minimum weight of spanning tree

Part I: Average Degree Let davg= davg(G) denote average degree in G, davg 1 (can compute exactly in linear time) Observe: approximating average of general function with range {0,..,n-1} (degrees range) requires (n) queries, so must exploit non-generality of degrees Can obtain (2+ )-approximation of davgby performing O(n1/2/ ) degree queries [Feige]. Going below 2: (n) queries [Feige]. With degree and neighbor queries, can obtain (1+ )- approximation by performing (n1/2poly(1/ )) queries [Goldreich,R]. Comment1: In both cases, can replace n1/2with (n/davg)1/2 Comment2: In both cases, results are tight (in terms of dependence on n/davg).

Average Degree (cont) Ingredient 1: Consider partition of all graph vertices into r=O((log n)/ ) buckets: In bucket Bivertices v s.t. (1+ )i-1 < deg(v) (1+ )i Suppose can obtain for each i estimate bi=|Bi| (1 (1/n) i bi (1+ )i= (1 How to obtain bi? By sampling (and applying [Chernoff]). Difficulty: if Biis small (<< n1/2) then necessary sample is too large ((|Bi|/n)-1>> n1/2). ( = /8 ) ) ) ) davg (*) Ingredient 2: Ignore small Bi s. Take sum in (*) only over large buckets (|Bi| > ( n)1/2/2r). (1/n) large i bi (1+ )i davg/(2+ ) Claim: (**)

Average Degree (cont) Claim: (1/n) large i bi (1+ )i davg/(2+ ) (**) Sum of degrees = 2 num of edges (small: |Bi| ( n)1/2/2r, r : num of buckets) small buckets large buckets counted twice not counted counted once Using (**) get (2+ )-approximation with (n1/2/ 2) degree queries Ingredient 3: Estimate num of edges counted once and compensate for them.

Average Degree (cont) Ingredient 3: Estimate num of edges counted once and compensate for them. large buckets small buckets Bi For each large Biestimate num of edges between Biand small buckets by sampling neighbors of (random) vertices in Bi. By adding this estimate eito (**) get (1+ )-approx. (1/n) large i bi (1+ )i (1/n) large i (bi (1+ )i + ei) (**)

Part I(b): Number of subgraphs - Stars Approximating avg. degree same as approximating num of edges. What about other subgraphs? (Also known as counting network motifs.) [Gonen,R,Shavitt] considered length-2 paths, and more generally, k-stars. (avg deg + 2-stars gives variance, larger k higher moments) Let sk= sk(G) denote num of k-stars. Give (1+ )-approx algorithm with query complexity (degree+neighbors): / 1 k k n k n / 1 + 1 k min , (log / 1 , n ) O n poly ) 1 + / 1 1 k /( 1 k k s s Show that this upper bound is tight.

Part I(c): Number of subgraphs - Triangles Counting number of triangles t=t(G) exactly/approximately studied quite extensively in the past. All previous algorithms read entire graph. [Gonen,R,Shavitt] showed that using only degree and neighbor queries there is no sublinear algorithm for approximately counting num of triangles. Natural question: What if also allow vertex-pair queries? [Eden,Levi,R,Seshadhri] Give (1+ )-approx algorithm with query complexity (degree+neighbors+vertex-pairs): O(n/t1/3+ m3/2/t) poly(log n,1/ ) and give matching lower bound.

Part II: The Minimum Vertex Cover (VC) Recall: For a graph G = (V,E), a vertex cover of G is a subset C V s.t. for every {u,v} E, {u,v} C Computing the size of a minimum vertex cover is NP-hard, but a factor-2 approx can be found in linear time [Gavril], [Yanakakis] Can we get approx even more efficiently, i.e., in sublinear-time?

Min VC (Imaginary) Oracle Initially considered in [Parnas,R]. First basic idea: Suppose had oracle that for given vertex v answers if v C for some fixed vertex cover C that is at most factor larger than min size of vertex cover, vc(G). v? https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcRg3OlsBXkJV7KxcM7lkx_KA6s98HW2aRfDi7bC2f4iQHQIjwNt Consider uniformly and independently sampling s= (1/ 2) vertices, and querying oracle on each sampled vertex. v C ! v C ! Let svcbe number of sampled vertices that answers: in C. By Chernoff, w.h.c.p |C|/n- /2 svc/s |C|/n+ /2 Since vc(G) |C| vc(G), if define vc = (svc/s + /2)n then (w.h.c.p) vc(G) vc vc(G) + n That is, get ( , )-approximation

Min VC: Distributed connection Second idea: Can use distributed algorithm to implement oracle. Suppose have dist. alg. (in message passing model) that works in k rounds of communication. Then oracle, when called on vertex v, will emulate dist. alg. on k-distance-neighborhood of v https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcRg3OlsBXkJV7KxcM7lkx_KA6s98HW2aRfDi7bC2f4iQHQIjwNt v v C ! v? Query complexity of each oracle call: O(dk) where d is max degree in the graph.

Min VC - Results By applying dist. alg. of [Kuhn,Moscibroda,Wattenhofer] get (c, )-approx. (c>2) with complexity dO(log d)/ 2, and (2, )-approx. with complexity dO(d)poly(1/ ). Comment 1: Can replace max deg d with davg/ [PR] Comment 2: Going below 2 : (n1/2) queries (Trevisan) 7/6: (n) [Bogdanov,Obata,Trevisan] Comment 3: Any (c, )-approximation: (davg) queries [PR] Sequence of improvements for (2, )-approx [Marko,R]: dO(log(d/ )) - using dist. alg. similar to max ind. set alg of [Luby] [Nguyen,Onak]: 2O(d)/ 2 emulate classic greedy algorithm (maximal matching) [Gavril],[Yanakakis] [Yoshida,Yamamoto,Ito]: O(d4/ 2) sophisticated analysis of better emulation [Onak,R,Rosen,Rubinfeld]: (davgpoly(1/ ))

Min VC: [NO] Algorithm Factor-2 approx alg of [Gavril],[Yanakakis] (not sublin) C , U E (C: vertex cover, U: uncovered edges, ) while U - select (arbitrarily) e = {u,v} in U - C C {u,v} (add both endpoints of e to cover) - U U \ ({ {u,w} E } { {v,w} E } ) (remove from U all edges covered by u or v) Slightly different description of alg: C , U E, For j = 1 to |E| - If j= {u,v} in U: C C {u,v} U U \ ({ {u,w} E } { {v,w} E } ) arbitrary permutation of E ,M 5 2 7 3 8 4 1 6 , M M { {u,v} }

Min VC: [NO] Algorithm (cont) Let M (G) be maximal matching resulting from alg when using permutation (so that vc(G)/2 |M (G)| vc(G) ) Can estimate vc(G) by estimating |M (G)|: sample (1/ 2) edges uniformly; for each check if in M (G) by calling maximal matching oracle. Basic observation: For an edge e = {u,v}, if for some e = {u,w} (or {v,w}), (e ) < (e) and e in M (G) then e not in M (G). Otherwise, e in M (G) . MO (input: edge e = {u,v}, output: is e in M (G)) For each edge e e incident to u or v - if (e ) < (e) if MO (e ) = TRUE then return FALSE return TRUE 5 8 e e

Min VC: [NO] Algorithm (cont) MO (input: edge e = {u,v}, output: is e in M (G)) For each edge e e incident to u or v - if (e ) < (e) if MO (e ) = TRUE then return FALSE return TRUE Main claim of [NO]: For a randomly selected , the expected query complexity of oracle is 2O(d). Analysis based on bounding (in expectation) the total number of edges reached in recursive calls (sequence of recursive calls corresponds to sequence of decreasing values). Note: can select on the fly , by assigning each new edge considered a random (discretizied) value in [0,1].

From exp(d) to poly(d) and approximate Maximum Matching [YYI] analyzed a variant suggested by [NO] and proved that expected number of oracle calls for variant is O(d2), resulting in poly(d,1/ ) complexity (and [ORRR] further modified alg to get almost optimal complexity). Same algs give (roughly) -approx for maximum matching. [NO] and [YYI] build on alg and [Hopcroft&Karp] to get (1- )-approx using exp(dO(1/ )) (dO(1/ ), resp.) queries

Distributed Connection Revisited Previously observed that can implement VC-oracle based on known distributed algorithm(s) Now observe that oracle implementation of [NO] for approx-max-match gives distributed algorithm that performs dO(1/ )rounds and has high constant success probability. Compare to O(log(n)/ 3)-round alg of [Lotker, Patt- Shamir,Pettie] that has success prob 1-1/poly(n) A different distributed algorithm, using ideas from [NO] as well as distributed coloring alg [Linial], performs dO(1/ )+ log*(n)/ 2 rounds, and succeeds with prob 1 [Even,Medina,R] (Studied in context of Centralized Local Algorithms)

Part III: Min Weight Spanning Tree Consider graphs with degree bound d and weights in {1,..,W}. Goal is to approx weight of MST [Chazelle,Rubinfeld,Trevisan] give (1+ )-approximation alg using (d W/ 2) neighbor queries. Result is tight and extends to d=davgand weights in [1,W]. Suppose first: W=2 (i.e., weights either 1 or 2) E1= edges with weight 1, G1=(V,E1), c1= num of connected components in G1. Weight of MST: 2 (c1-1) + 1 (n-1-(c1-1)) = n-2+c1 Estimate MST weight by estimating c1

MST (cont) More generally (weights in {1,..,W}) Ei= edges with weight i, Gi=(V,Ei), ci= num of connected components (cc s) in Gi. Weight of MST: n - W + i=1..W-1ci Estimate MST weight by estimating c1, ,cW-1. Idea for estimating num of cc s in graph H (c(H)): For vertex v, nv= num of vertices in cc of v. Then: c(H) = v(1/nv) 3 (1/3) 4 (1/4) 2 (1/2)

MST (cont) c(H) = v(1/nv) (nv= num of vertices in cc of v) Can estimate c(H) by: selecting sample R of vertices, for each v in R, finding nv(using BFS) and taking (normalized) sum over sample: (n/|R|) v R(1/nv) Difficulty: if nvis large, then expensive Let S = {v : nv B}. (S for small ) v S(1/nv) > c(H) n/B Alg for estimating c(H) selects sample R of vertices, runs BFS on each selected v in R until finds nvor determines that nv> B (i.e. v S). Estimate c(H) by (n/|R|) v R S(1/nv) Complexity: O(|R| B d) v

MST (cont) For any < 1, if set B=2/ and |R| = (1/ 2) get estimate of C(H) to within n with complexity O(d/ 3) Alg for estimating MST weight can run above alg on each Gi with = /W, so that when sum estimates of ci for i=1, ,W get desired (1 ) approximation. (Gi=(V,Ei), Ei= edges with weight i) (ci= num of connected components in Gi) (Weight of MST: n - W + i=1..W-1ci) Comment: [Chazelle,Rubinfeld,Trevisan] get better complexity (total of (d W/ 2) ) by more refined alg

Summary Talked about sublinear approximation algorithms for various graph parameters: I. Average degree and number of stars and triangles II. Size of minimum vertex cover (maximum matching) III. Weight of minimum weight spanning tree There is a high-level connection to distributed computing through sublinearity, and there some are concrete algorithmic connections

Thanks

From [NO] to [YYI] [NO] suggested following greedy variant: MO (input: edge e = {u,v}, output: is e in M (G)) Query G on all edges incident to u and v. Let e1, ,et be ordering of edges according to (i.e., (ej)< (ej+1) ) j 1 while (ej)< (e) - if MO (ej) = TRUE then return FALSE - else j j+1 return TRUE 3 4 10 7 e 2 1 6 [YYI] analyzed variant and proved that expected number of queries for variant is O(d2), resulting in algorithm with query complexity O(d4/ 2) (extra factors of d due to prob of selecting e in M (G) being |M (G)| / (d n) )

Part III(b): Maximum Matching and more [Nguyen,Onak] give (1, )-approx for max match with complexity 2d*O(1/ ), improved [Yoshida,Yamamoto,Ito] to d6/ *2(1/ )O(1/ ) Recursive application of oracles using augmenting paths. [Hassidim,Kelner,Nguyen,Onak],[Elek] give (1, )-approx algs on restricted graphs (e.g., planar) Can get (O(log d), )-approx for min dominating set with complexity dO(log d)/ 2using [Kuhn,Moscibroda,Wattenhofer]

Part I(b): Number of stars subgraphs n n N s Ns n1+1/s: O(n/(Ns)1/(1+s)) n1+1/s Ns ns: O(n1-1/s) Ns> ns: O(ns-1/s/(Ns)1-1/s) = O((ns+1/Ns)1-1/s) Example: s=3. (a) Ns= n : O(n3/4); (b) Ns= n2: O(n2/3); (c) Ns= n4: O(1) Idea of algorithm for s=2: Also partition into buckets. Can estimate num of 2-stars with centers in large buckets. Also estimate num of 2-stars with centers in ( significant ) small buckets and at least one endpoint in large bucket by estimating num of edges between pairs of buckets. 1 / s s n + 1 1 / s min , (log 1 , n / ) O poly + 1 /( 1 ) 1 s 1 / s s N + i 1 ( ) ib 2

Part IV: Approximating Distance to P For graph property P, estimate fraction of edges that should be added/removed to obtain P (fraction with respect to (ub on) num of edges m). Assume m= (n). Study of distance approximation first explicitly introduced in [Parnas,R,Rubinfeld]. Note: Already discussed alg for distance to connectivity in sparse graphs (estimate num of cc s) For dense graphs where m= (n2) and perform vertex- pair queries, some known testing results directly give dist. approx. results: e.g., -cut (having a cut of size at least n2): (1, )-approx using poly(1/ ) queries (exp(poly(1/ )) time) equiv to approx Max-Cut. [Fischer,Newman]: all testable properties (comp. independent of n) have dist. approx. algs. Direct Analysis for monotone properties [Alon,Shapira,Sudakov]

Part IV: Approximating Distance to P Dist. app. for sparse graphs studied in [Marko,R] distance w.r.t, d n Property Model Complexity k-Edge- Connectivity Triangle- Freeness Eulerian Cycle- Freeness sparse 1 poly(k/( davg)) bounded- degree sparse bounded- degree 3 dO(log(d/ )) (n1/2) for sparse model 1 1 O(1/( davg)4) O(1/ 3) Extends to subgraph-free [Hassidim,Kelner,Nguyen,Onak] give (1, )-approx for restricted graphs: e.g. dist. to 3-col in planar graphs. cannot get sublin with =1

Sublinear Algorithms and Graph Parameters in Centralized and Distributed Computing

Download Presentation

Presentation Transcript

Related

More Related Content