Dichotomy on Complexity of Consistent Query Answering

A D
ICHOTOMY
 
ON
 T
HE
 C
OMPLEXITY
 
OF
C
ONSISTENT
 Q
UERY
 A
NSWERING
 
FOR
A
TOMS
 W
ITH
 S
IMPLE
 K
EYS
Paris Koutris
Dan Suciu
University of Washington
R
EPAIRS
 
An 
uncertain
 instance 
I
 for a schema with 
key constraints
A 
repair
 r of 
I
 is a subinstance of 
I
 that satisfies the key
constraints and is 
maximal
2
 
 
The 4 possible repairs
C
ONSISTENT
 Q
UERY
 A
NSWERING
If 
Q
 is 
boolean
, we say that 
I
 is 
certain
 for 
Q
, 
I
 |= 
Q
,
 
if for
every repair r of 
I
, 
Q
(r) is true
3
 
 
 
Q
() = 
R
(
x
, y), 
S
(
y
, z)
I
 |= 
Q
 
P
ROBLEM
 S
TATEMENT
 
CERTAINTY(
Q
): 
Given as input an instance I, does I |= 
Q
 when
Q
 is a boolean CQ?
 
In general, CERTAINTY(
Q
) is in coNP
Q
1
 = R(
x
, y), S(
y
, z) : expressible as a 
first-order query
Q
2
 = R(
x
, y), S(
z
, y) : 
coNP-complete
Q
3
 = R(
x
, y), S(
y
, x) : 
PTIME
 but not first-order expressible
4
Conjecture
 For every boolean conjunctive query 
Q
,
CERTAINTY(
Q
) is either in PTIME or coNP-complete
P
ROGRESS
 
SO
 F
AR
 
[Wijsen, 2010]
Syntactic characterization of FO-expressible acyclic CQs w/o self-
joins
[Kolaitis and Pema, 2012]
A trichotomy for CQs with 2 atoms and no self-joins
[Wijsen, 2010 & 2013]
PTIME algorithm for 
cyclic queries
: C
k
 = R
1
(
x
1
,x
2
), …, R
k
(
x
k
, x
1
)
Further classification of acyclic CQs w/o self-joins
5
O
UR
 C
ONTRIBUTION
A 
dichotomy
 for CQs w/o self-joins where atoms have either
Simple keys : R(
x
, y, z)
Keys that consist of all attributes: S(
x, y, z
)
6
Theorem
 
For every boolean CQ 
Q
 w/o self-joins where
for each atom the key consists of either 
one attribute
  or
all attributes
, there exists a 
dichotomy
 of CERTAINTY(
Q
)
into PTIME and coNP-complete
O
UTLINE
1.
The Dichotomy Condition
2.
Frugal Repairs & Representable Answers
3.
Strongly Connected Graphs
7
 
 
T
HE
 Q
UERY
 G
RAPH
We equivalently study boolean CQs consisting only of
binary relations 
where one attribute is the key: R(
x
, y)
Relations can be 
consistent
 (R
c
) or 
inconsistent
 (R
i
)
Query Graph
: a directed edge (u, v) for each atom R(
u
,v)
8
Q 
= R
i
(
x
, y), S
i
(
z
, w), T
c
(
y
, w)
 
y
 
w
 
x
 
S
 
T
 
R
 
z
 
G[
Q
]
 
source node u
R
 
end node v
R
D
EFINITIONS
 
x
+,R
 
: set of nodes reachable from node 
x
 once we remove
the edge R (through a 
directed
 path)
R ~ S 
[
source-equivalent
]: source nodes u
R
, u
S
 are in the
same SCC
[R]
: the equivalence class of R w.r.t ~
9
y
R
z
x
T
S
v
w
u
x
+,R
 
= {x, v, w}
R ~ T and 
[R] 
= {R, T}
V
U
C
OUPLED
 E
DGES
coupled
+
(R) 
= edges in [R] 
+ 
any inconsistent edge S s.t. the
source node 
u
S
 is connected to the end node 
v
R
 through a
(undirected) path that does not intersect with 
u
R
+,R
10
y = 
v
R
 
R
z
x = 
u
R
T
S
v
w
u = 
u
V
 
coupled
+
(R):
contains R,T: [R] = {R, T}
contains V: path from y (= v
R
 )
to u (= u
V
)
does not contain U
V
U
 
The set 
u
R
+,R
S
PLITTABLE
 G
RAPHS
 
Two inconsistent edges R, S are 
coupled 
if
S in coupled
+
(R) & R in coupled
+
(S)
A graph G[
Q
] is:
 
unsplittable 
if it contains a pair of coupled edges that are 
not
source-equivalent.
splittable 
otherwise
11
y
R
z
x
T
S
v
w
u
V
U
coupled
+
(R) = {R, T, V}
coupled
+
(T) = {R, T, V}
coupled
+
(V) = {V}
coupled
+
(U) = {U,V,R,T}
Only R,T are coupled
 
SPLITTABLE!
T
HE
 D
ICHOTOMY
 C
ONDITION
12
y
R
z
x
T
S
v
w
u
V
U
Dichotomy Theorem
If G[
Q
] is 
splittable
, CERTAINTY(
Q
) is in PTIME
If G[
Q
] is 
unsplittable
, CERTAINTY(
Q
) is coNP-
complete
Splittable, so in PTIME
E
XAMPLES
13
PTIME
R(
x
, y), S(
y
, z)
 
coNP-complete
 
R(
x
, y), S(
y
, z), T
c
(
x
, z)
 
x
 
y
 
z
x
y
z
 
PTIME
 
R(
x
, y), S(
y
, z), U
c
(
z
, y)
 
x
 
y
 
z
 
coNP-complete
 
R(
x
, y), S(
z
, y), U
c
(
y
, z)
 
x
 
y
 
z
O
UTLINE
1.
The Dichotomy Condition
2.
Frugal Repairs & Representable Answers
3.
Strongly Connected Graphs
14
 
 
F
RUGAL
 R
EPAIRS
 (1)
15
Definition
 A repair r of an instance 
I
 is 
frugal
 for a
boolean query 
Q
 if for any other repair r’ of
 I
, 
Q
f
(r’) is 
not
strictly contained 
in 
Q
f
(r)
 
 
repair 
r
1
 = { 
R(
a
1
, b
1
)
, R(
a
2
, b
3
), R(
a
3
, b
4
), R(
a
4
, b
4
)
                        S(
b
1
, a
1
), S(
b
3
, a
2
),  S(
b
4
, a
3
) }
Q
f
(
r
1
)       = {    (a
1
, b
1
),   (a
2
, b
3
),    (a
3
, b
4
) }
repair 
r
2
 = { 
R(
a
1
, b
2
)
, R(
a
2
, b
3
), R(
a
3
, b
4
), R(
a
4
, b
4
)
                        S(
b
1
, a
1
), S(
b
3
, a
2
),  S(
b
4
, a
3
) }
Q
f
(
r
2
)       = {                     (a
2
, b
3
),    (a
3
, b
4
) }
not frugal
frugal
Q
f
 = all body variables to the head (
full query
)
 
 
F
RUGAL
 R
EPAIRS
 (2)
16
I
 |= 
Q
 if and only if every 
frugal repair 
satisfies 
Q
We lose no generality if we study only frugal repairs!
Only two frugal repairs:
Q
f
(r
2
) = {(a
2
, b
3
), (a
3
, b
4
)}
Q
f
(r
3
) = {(a
2
, b
3
), (a
4
, b
4
)}
O
R
-S
ETS
17
 
Efficiently represent all answer sets of frugal repairs
We use 
or-sets
:
 
<1, 2, 3> means 1 or 2 or 3
A = < {1, 3}, {1, 4}, {2, 3}, {2, 4} >
We can “
compress
” A as B = {<1, 2>, <3, 4>}
[Libkin and Wong, ‘93] 
decompression
α
 operator: 
α
(B) = A
 
The 
or-set of answer sets 
for frugal repairs of I for Q:
M
Q
(I) 
= <
 
{(a
2
, b
3
), (a
3
, b
4
)}
,   
{(a
2
, b
3
), (a
4
, b
4
)} >
Compressed form (
set of or-sets
):
A
Q
(I) 
= { <
 
(a
2
, b
3
) >, < (a
3
, b
4
)
, 
(a
4
, b
4
) > }
R
EPRESENTABILITY
 (1)
18
 
An or-set-of-sets 
S
 is 
representable
 if there exists a set-of-
or-sets 
S
0
 (
compression
) such that:
α
(
S
0
) = 
S
For any distinct or-sets A, B in 
S
0
, the tuples in A and B use distinct
constants in all coordinates
The compression of a representable set with active domain
of size 
n
 has size 
polynomial
 in 
n
<
 
{(a
2
, b
3
), (a
3
, b
4
)}
, 
{(a
2
, b
3
), (a
4
, b
4
)} > 
{<
 
(a
2
, b
3
) >, <(a
3
, b
4
)
, 
(a
4
, b
4
) >}
<
 
{(a
2
, b
3
), (a
3
, b
4
)}
, 
{(a
2
, b
2
), (a
4
, b
4
)} > 
compression
not representable
R
EPRESENTABILITY
 (2)
19
I
 |= 
Q
 
iff
 the compression A
Q
(
I
) is not empty
If we can compute A
Q
(
I
) in polynomial time, deciding
whether 
I
 |= 
Q
 is in PTIME
Theorem
 
If G[
Q
] is a 
strongly connected 
graph, M
Q
(
I
) is
representable and its compression can be computed in
polynomial time in the size of 
I
O
UTLINE
1.
The Dichotomy Condition
2.
Frugal Repairs & Representable Answers
3.
Strongly Connected Graphs
20
 
 
C
YCLES
21
 
C
k
= R
1
(
x
1
, x
2
), R
2
(
x
2
, x
3
)…, R
k
(
x
k
, x
1
)
The 
purified
 instance contains a
collection of disjoint SCCs
ALGORITHM
 
FrugalC
Find the SCCs that contain no directed
cycle of length > k
For each such SCC i, create an or-set A
i
that contains all cycles of length k
Output A
Ck
(I) = {A
1
, A
2
, …}
 
 
 
a
1
b
1
c
1
a
2
b
2
c
2
b
3
 
A
C3
(I) 
= {<(a
1
, b
1
, c
1
)>, <(a
2
, b
2
, c
2
), (a
2
, b
3
, c
2
)>}
G
ENERAL
 C
ASE
: SCC
S
 (1)
22
Recursively split 
a SCC 
G
 into a SCC 
G’ 
and a directed
path 
P
 that intersects 
G’ 
only at its start and end node
The set A
G’
(I) can be recursively computed
x
y
R
S
T
t
U
V
Graph G’
The path P 
= y -- > t 
-- > z
A
G’
(I) 
= {<(a
1
, b
1
, c
1
)>, <(a
2
, b
2
, c
2
), (a
2
, b
3
, c
2
)>}
A
1
A
2
z
G
ENERAL
 C
ASE
: SCC
S
 (2)
23
A
G’
(I) 
= {<(a
1
, b
1
, c
1
)>, <(a
2
, b
2
, c
2
), (a
2
, b
3
, c
2
)>}
A
1
A
2
 
 
 
 
Any value belongs in 
a unique or-set
a
y
t
U
V
b
B
B
1
c
z
B
2
c
B
0
c
Replacement
of G’
A cycle 
C = a -> b -> y -> t -> z -> a 
+
 a
chord 
B
2
 that is a consistent relation
R
EST
 O
F
 
THE
 P
ROOF
24
PTIME
 algorithm for splittable graphs
Find a 
separator
 in G[
Q
] (always exists if a graph is splittable)
The separator splits G[
Q
] into cases with 
fewer 
inconsistent edges,
which are solved recursively
Base case
: all edges are consistent (check whether 
Q
(
I
) is true)
coNP-hardness
Reduction from the 
Monotone-3SAT 
problem
C
ONLUSIONS
25
Significant progress towards proving the dichotomy for the
complexity of 
Certain Query Answering for 
Conjunctive
Queries
Settle the 
dichotomy 
(or trichotomy) even for queries with
self-joins!
 
Thank you !
26
Slide Note
Embed
Share

The research paper presents a dichotomy on the complexity of consistent query answering for atoms with simple keys. It discusses repairs for uncertain instances in a schema with key constraints, as well as the concept of consistent query answering. The document addresses the problem statement of certainty, progress made in the field so far, and outlines the contributions towards understanding the certainty of boolean conjunctive queries. The study offers a theorem highlighting a dichotomy for queries without self-joins, where atoms have simple keys, leading to a classification into PTIME and coNP-complete categories.

  • Query Answering
  • Complexity
  • Consistency
  • Research
  • Certainty

Uploaded on Oct 10, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A DICHOTOMY ON THE COMPLEXITY OF CONSISTENT QUERY ANSWERING FOR ATOMS WITH SIMPLE KEYS Paris Koutris Dan Suciu University of Washington

  2. REPAIRS An uncertain instance I for a schema with key constraints A repair r of I is a subinstance of I that satisfies the key constraints and is maximal The 4 possible repairs R(x, y) (a1, b1) (a1, b1) (a1, b1) (a1, b2) (a1, b2) (a1, b2) (a2, b2) (a2, b2) (a2, b2) (a2, b2) (a2, b2) (a3, b3) (a3, b4) (a3, b3) (a3, b4) (a3, b3) (a4, b4) (a4, b4) (a4, b4) (a4, b4) (a3, b4) (a4, b4) 2

  3. CONSISTENT QUERY ANSWERING If Q is boolean, we say that I is certain for Q, I |= Q,if for every repair r of I, Q(r) is true R(x, y) S(y, z) (a1, b1) (b1, c1) Q() = R(x, y), S(y, z) I |= Q (a1, b2) (b2, c1) (a2, b2) (b2, c2) (a3, b3) (b3, c3) (a3, b4) (a4, b4) 3

  4. PROBLEM STATEMENT CERTAINTY(Q): Given as input an instance I, does I |= Q when Q is a boolean CQ? In general, CERTAINTY(Q) is in coNP Q1 = R(x, y), S(y, z) : expressible as a first-order query Q2 = R(x, y), S(z, y) : coNP-complete Q3 = R(x, y), S(y, x) : PTIME but not first-order expressible Conjecture For every boolean conjunctive query Q, CERTAINTY(Q) is either in PTIME or coNP-complete 4

  5. PROGRESSSO FAR [Wijsen, 2010] Syntactic characterization of FO-expressible acyclic CQs w/o self- joins [Kolaitis and Pema, 2012] A trichotomy for CQs with 2 atoms and no self-joins [Wijsen, 2010 & 2013] PTIME algorithm for cyclic queries: Ck = R1(x1,x2), , Rk(xk, x1) Further classification of acyclic CQs w/o self-joins 5

  6. OUR CONTRIBUTION A dichotomy for CQs w/o self-joins where atoms have either Simple keys : R(x, y, z) Keys that consist of all attributes: S(x, y, z) Theorem For every boolean CQ Q w/o self-joins where for each atom the key consists of either one attribute or all attributes, there exists a dichotomy of CERTAINTY(Q) into PTIME and coNP-complete 6

  7. OUTLINE 1. The Dichotomy Condition 2. Frugal Repairs & Representable Answers 3. Strongly Connected Graphs 7

  8. THE QUERY GRAPH We equivalently study boolean CQs consisting only of binary relations where one attribute is the key: R(x, y) Relations can be consistent (Rc) or inconsistent (Ri) Query Graph: a directed edge (u, v) for each atom R(u,v) source node uR G[Q] x z Q = Ri(x, y), Si(z, w), Tc(y, w) R S y w T end node vR 8

  9. DEFINITIONS x+,R : set of nodes reachable from node x once we remove the edge R (through a directed path) R ~ S [source-equivalent]: source nodes uR, uS are in the same SCC [R]: the equivalence class of R w.r.t ~ u x+,R = {x, v, w} R ~ T and [R] = {R, T} y R V S v x T U z w 9

  10. COUPLED EDGES coupled+(R) = edges in [R] + any inconsistent edge S s.t. the source node uS is connected to the end node vR through a (undirected) path that does not intersect with uR+,R u = uV y = vR coupled+(R): contains R,T: [R] = {R, T} contains V: path from y (= vR ) to u (= uV) does not contain U R V x = uR S v T U z w The set uR+,R 10

  11. SPLITTABLE GRAPHS Two inconsistent edges R, S are coupled if S in coupled+(R) & R in coupled+(S) A graph G[Q] is: unsplittable if it contains a pair of coupled edges that are not source-equivalent. splittable otherwise coupled+(R) = {R, T, V} coupled+(T) = {R, T, V} coupled+(V) = {V} coupled+(U) = {U,V,R,T} u y R V S v x Only R,T are coupled T U z w SPLITTABLE! 11

  12. THE DICHOTOMY CONDITION Dichotomy Theorem If G[Q] is splittable, CERTAINTY(Q) is in PTIME If G[Q] is unsplittable, CERTAINTY(Q) is coNP- complete u y R V Splittable, so in PTIME S v x T U z w 12

  13. EXAMPLES R(x, y), S(y, z) R(x, y), S(y, z), Tc(x, z) x z z x PTIME coNP-complete y y R(x, y), S(z, y), Uc(y, z) R(x, y), S(y, z), Uc(z, y) x z x z coNP-complete PTIME y y 13

  14. OUTLINE 1. The Dichotomy Condition 2. Frugal Repairs & Representable Answers 3. Strongly Connected Graphs 14

  15. FRUGAL REPAIRS (1) Definition A repair r of an instance I is frugal for a boolean query Qif for any other repair r of I, Qf(r ) is not strictly contained in Qf(r) Qf = all body variables to the head (full query) not frugal repair r1 = { R(a1, b1), R(a2, b3), R(a3, b4), R(a4, b4) S(b1, a1), S(b3, a2), S(b4, a3) } Qf(r1) = { (a1, b1), (a2, b3), (a3, b4) } R(x, y) S(y, x) (a1, b1) (b1, a1) frugal (a1, b2) repair r2 = { R(a1, b2), R(a2, b3), R(a3, b4), R(a4, b4) S(b1, a1), S(b3, a2), S(b4, a3) } Qf(r2) = { (a2, b3), (a3, b4) } (a2, b3) (b3, a2) (a3, b4) (b4, a3) (a4, b4) (b4, a4) 15

  16. FRUGAL REPAIRS (2) I |= Q if and only if every frugal repair satisfies Q We lose no generality if we study only frugal repairs! Only two frugal repairs: Qf(r2) = {(a2, b3), (a3, b4)} Qf(r3) = {(a2, b3), (a4, b4)} R(x, y) S(y, x) (a1, b1) (b1, a1) (a1, b2) (a2, b3) (b3, a2) (a3, b4) (b4, a3) (a4, b4) (b4, a4) 16

  17. OR-SETS Efficiently represent all answer sets of frugal repairs We use or-sets: <1, 2, 3> means 1 or 2 or 3 A = < {1, 3}, {1, 4}, {2, 3}, {2, 4} > We can compress A as B = {<1, 2>, <3, 4>} [Libkin and Wong, 93] decompression operator: (B) = A The or-set of answer sets for frugal repairs of I for Q: MQ(I) = <{(a2, b3), (a3, b4)}, {(a2, b3), (a4, b4)} > Compressed form (set of or-sets): AQ(I) = { <(a2, b3) >, < (a3, b4), (a4, b4) > } 17

  18. REPRESENTABILITY (1) An or-set-of-sets S is representable if there exists a set-of- or-sets S0 (compression) such that: (S0) = S For any distinct or-sets A, B in S0, the tuples in A and B use distinct constants in all coordinates The compression of a representable set with active domain of size n has size polynomial in n <{(a2, b3), (a3, b4)}, {(a2, b3), (a4, b4)} > <{(a2, b3), (a3, b4)}, {(a2, b2), (a4, b4)} > compression not representable {<(a2, b3) >, <(a3, b4), (a4, b4) >} 18

  19. REPRESENTABILITY (2) I |= Qiff the compression AQ(I) is not empty If we can compute AQ(I) in polynomial time, deciding whether I |= Q is in PTIME Theorem If G[Q] is a strongly connected graph, MQ(I) is representable and its compression can be computed in polynomial time in the size of I 19

  20. OUTLINE 1. The Dichotomy Condition 2. Frugal Repairs & Representable Answers 3. Strongly Connected Graphs 20

  21. CYCLES S(y, z) T(z, x) R(x, y) Ck= R1(x1, x2), R2(x2, x3) , Rk(xk, x1) The purified instance contains a collection of disjoint SCCs ALGORITHM FrugalC Find the SCCs that contain no directed cycle of length > k For each such SCC i, create an or-set Ai that contains all cycles of length k Output ACk(I) = {A1, A2, } (b1, c1) (c1, a1) (a1, b1) (b2, c2) (c2, a2) (a2, b2) (b3, c2) (a2, b3) a1 a2 b3 b1 b2 c1 c2 AC3(I) = {<(a1, b1, c1)>, <(a2, b2, c2), (a2, b3, c2)>} 21

  22. GENERAL CASE: SCCS (1) Recursively split a SCC G into a SCC G and a directed path P that intersects G only at its start and end node The set AG (I) can be recursively computed z x T V S R The path P = y -- > t -- > z Graph G t y U AG (I) = {<(a1, b1, c1)>, <(a2, b2, c2), (a2, b3, c2)>} A1 A2 22

  23. GENERAL CASE: SCCS (2) Any value belongs in a unique or-set AG (I) = {<(a1, b1, c1)>, <(a2, b2, c2), (a2, b3, c2)>} A1 A2 B2c(b, z) B1c(b, y) B(a, b) B0c(z, b) ([a1b1c1], c1) ([a1b1c1], b1) (A1, [a1b1c1]) (c1, A1) ([a2b2c2], c2) ([a2b2c2], b2) (A2, [a2b2c2]) (c2, A2) ([a2b3c2], c2) ([a2b3c2], b3) (A2, [a2b3c2]) a B0c A cycle C = a -> b -> y -> t -> z -> a + a chord B2 that is a consistent relation B z B2c Replacement of G V b B1c U t y 23

  24. REST OFTHE PROOF PTIME algorithm for splittable graphs Find a separatorin G[Q] (always exists if a graph is splittable) The separator splits G[Q] into cases with fewer inconsistent edges, which are solved recursively Base case: all edges are consistent(check whether Q(I) is true) coNP-hardness Reductionfrom the Monotone-3SAT problem 24

  25. CONLUSIONS Significant progress towards proving the dichotomy for the complexity of Certain Query Answering for Conjunctive Queries Settle the dichotomy (or trichotomy) even for queries with self-joins! 25

  26. Thank you ! 26

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#