Datalog Revival and Limitation of Relational Calculus

Datalog Revival
Serge Abiteboul
INRIA Saclay, Collège de France, ENS Cachan
9/25/2024
1
9/25/2024
1
9/25/2024
1
Datalog history
Started in 77: logic and database workshop
Simple idea: 
add recursion to positive FO queries
Blooming in the 80
th
Logic programming was hot
Industry was not interested:
“No practical applications of recursive query theory … have been
found to date.” Hellerstein and Stonebraker (Readings in DB Systems)
Quasi dead except local resistance [e.g., A.,Gottlob]
Revival  in this century
9/25/2024
2
FO
+
FO
datalog
datalog
¬
¬
loop
Organization
Datalog
Datalog evaluation
Datalog with negation
Datalog revival
Conclusion
9/25/2024
3
Datalog
 
9/25/2024
4
Limitation of relational calculus
9/25/2024
5
G a graph: G(0,1), G(1,2), G(2,3), …
 
G(10,11)
Is there a path from 
0
 to 
11 
 in the graph?
k-path 
x
1
… x
k
 ( G(0,x
1
)
G(x
1
,x
2
)
G(x
k-1
,x
k
) 
G(x
k
,11) )
Path of unbounded length: 
infinite
 formula
k=1 to 
  
k-path
2
3
6
7
5
10
11
0
1
4
8
9
Datalog
G(2,3)
       
             
fact
T(x, y) ← G(x, z), T(z, y)
     
rule
datalog rule : R
1
(u
1
) 
 R
2
(u
2
), . . . ,R
n
(u
n
) for n 
 1
Each u
i
 is a vector of 
terms
Safe
: each variable occurring in head must occur in body
Intentional
 relation: occurs in the head
Extensional
 relation: does not
9/25/2024
6
 
head
 
     
body
Datalog 
program
 = set
of datalog rules
Term = constant or
variable
Datalog program
1.
G(0,1), G(1,2), G(2,3), …
 
G(10,11)
 
edb
(P) = {G} 
2.
T(x, y) ← G(x, y) 
    
idb
(P)  = {T,Ok}
3.
T(x, y) ← G(x, z), T(z, y) 
 
 
  
program 
P
4.
Ok()    ← T(0, 11)
9/25/2024
7
Datalog program
 
1.
G(0,1), G(1,2), G(2,3), …
 
G(10,11)
2.
T(x, y) ← G(x, y)
3.
T(x, y) ← G(x, z), T(z, y
)
4.
Ok()    ← T(0, 11)
Rule 2: v(x)=10 & v(y) = 11 
  
 T(10,11)
Rule 3: v(x)=9, v(z)=10 & v(y)=11 
 
 T(9,11)
Rule 3: v(x)=0, v(z)=1 & v(y)=11 
 
 T(0,11)
Rule 4: v(x)=0, v(y)=11 
  
 
Ok()
9/25/2024
8
 
T(10, 11) ← G(10, 11)
 
T(9, 11) ← G(9,10), T(10, 11)
 
T(0, 11) ← G(0,1), T(1, 11)
 
Ok()    ← T(0, 11)
Model semantics
View 
P
 as a first-order sentence 
P 
describing the answer
Associate a formula to each rule
 
R
1
(u
1
) ← R
2
(u
2
), . . . ,R
n
(u
n
) :
 
x
1
, . . . , x
m
(
 
R
2
(u
2
) 
 . . . 
 R
n
(u
n
) 
R
1
(u
1
) )
 
where x
1
, . . . , x
m
 are the variables occurring in the rule
P
 = {r
1
, ..., r
n
}, 
P
 = r
1
 
 ... 
 r
n
The 
semantics
 of 
P
 for a database 
I
,
 denoted 
P(I)
, is the
minimum model of
 
P
 
containing I
Does it always exist?
How can it be computed?
9/25/2024
9
Example: Transitive closure
G(0,1), G(1,2), G(2,3)
T(x,y) ← G(x,y)
T(x,y) ← G(x,z), T(z,y)
   
9/25/2024
10
G       P
----    ----
0 1    0 1
1 2    1 2
2 3    2 3
          0 2
           1 3
           0 3
G       P
----    ----
0 1    0 1
1 2    1 2
2 3    2 3
          0 2
           
1 3
G        P
----    ----
0 1    0 1
1 2    1 2
2 3    2 3
          0 2
           1 3
           0 3
           
6 3
Model  but not
minimal
Not a model
of the formula
Minimum
model
containing I
G       P
----    ----
0 1    0 1
1 2    1 2
          0 2
           
Does not
contain I
Existence of P(I)
There exists at least one such model: the largest instance
one can build with the constants occurring in I and P is
a model of P that includes I
 
– B(I,P)
P(I) always exists: it is the intersection of all models of P
that include I over the constants occurring in I and P
How can it be computed?
9/25/2024
11
Fixpoint semantics
A fact A is an 
immediate consequence 
for K and P if
1.
A is an extensional fact in K, or
2.
for some instantiation 
 
A ← A
1
, . . . , A
n
 of a rule in P,
each A
i
 is in K
Immediate consequence operator:
T
P
(K) = { immediate consequences for K and P }
Note: T
P 
is monotone
9/25/2024
12
Fixpoint semantics – continued
P(I) is a fixpoint of 
T
P 
– That is: T
P
(P(I))
 P(I)
Indeed, P(I) is the least fixpoint of 
T
P 
containing I
Yields a means of computing P(I)
I 
 T
P
(I)
 T
P
2
(I)
 ... 
 T
p
i
(I) = T
p
i+1
(I)
 
= P(I)   
  
 B(I,P)
9/25/2024
13
Proof theory
Proof technique: SLD resolution
A fact A is in P(I) iff there exists a proof o
f A
9/25/2024
14
Static analysis
Hard 
Deciding 
containment
 (P 
 P’) is undecidable
Deciding 
equivalence
 is undecidable
Deciding 
boundedness
 
is undecidable
There exists k such that for any I, the fixpoint converges in less than k
stages
So, 
optimization is hard
9/25/2024
15
Datalog evaluation by example
 
9/25/2024
16
More complicated example:
Reverse same generation
  
   up 
   
flat 
   
down 
  
 
a 
 
e 
  
g 
 
f 
  
l 
 
f 
 
 
a 
 
f 
  
m 
 
n 
  
m 
 
f 
 
 
f 
 
m 
  
m 
 
o 
  
g 
 
b 
 
 
g 
 
n 
  
p 
 
m 
  
h 
 
c 
 
 
h 
 
n 
     
i 
 
d 
 
 
i 
 
o 
     
p 
 
k 
 
 
j 
 
o 
       
rsg(x,y) ← flat(x,y) 
 
rsg(x,y) ← up(x,x1),rsg(y1,x1),down(y1,y) 
 
9/25/2024
17
rsg(x,y) ← flat(x,y) 
 
rsg(x,y) ← up(x,x1),rsg(y1,x1),down(y1,y)
 
9/25/2024
18
e
f
g
h
i
j
k
b
c
d
l
m
n
o
p
a
u
u
u
u
u
u
u
d
d
d
d
d
d
f
f
 
f
 
rsg
 
rsg
 
 
g    f
 m   n
 m   o
 p    m
 a    b
 h    f
 i     f
 j     f
 f     k
 a    c
 a    d
f
Naive algorithm
Fixpoint
rsg
0
 = 
rsg
i+1
 = flat 
 rsg
i  
 
 16
(
2=4
(
3=5
(up × rsgi
i 
 × down)))
Program
rsg := 
 ;
repeat
 
rsg := flat 
 rsg 
 
16
(
2=4
(
3=5
(up × rsg × down)))
until fixpoint
9/25/2024
19
Semi-naive
1
(x, y)   
flat(x, y)
i+1
(x, y) ← up(x, x1), 
i
(y1, x1), down(y1, y)
Compute 
I
Program
Converges to the answer
Not recursive & not a datalog program
Still redundant – to avoid it:
i+1
(x, y) ← up(x, x1), 
i
(y1, x1), down(y1, y), 

i
(
x, y)
9/25/2024
20
rsg(x,y) ← flat(x,y) 
 
rsg(x,y) ← up(x,x1),rsg(y1,x1),down(y1,y)
 
9/25/2024
21
e
f
g
h
i
j
k
b
c
d
l
m
n
o
p
a
u
u
u
u
u
u
u
d
d
d
d
d
d
f
f
f
 
 
g    f
 m   n
 m   o
 p    m
 a    b
 h    f
 i     f
 j     f
 f     k
 a    c
 a    d
f
 
1
 
2
 
3
Semi-naïve (end)
More complicated if the rules are not linear
T(
x, y) 
← G(
x, y)
T(
x, y) ← 
T
(x, z), 
T
(z, y)
1
(x, y) 
← G(
x, y)
anc
1
 := 
1
temp
i+1
(x, y) ← 
i
(x, z), anc
i
(z, y)
temp
i+1
(x, y) 
anc
i
(x, z),
 
i
(z, y)
i+1  
:= temp
i+1 
 
anc
i
a
nc
i
+1 
:= anc
i
 
 
i+1
9/25/2024
22
And beyond
Start from a program and a query
rsg(x,y) ← flat(x,y) 
 
rsg(x,y) ← up(x,x1),rsg(y1,x1),down(y1,y)
query(y) ← rsg(a, y)
Optimize to avoid deriving useless facts
Two competing techniques that are roughly equivalent
Query-Sub-Query
Magic Sets
9/25/2024
23
Magic Set
rsg
bf
(x, y) 
input
_
rsg
bf
(x), flat(x, y)
rsgfb(x, y) 
input
_
rsg
fb
(y), flat(x, y)
sup
31
(x, x
1
) 
input
_
rsg
bf
(x), up(x, x
1
)
sup
32
(x, y
1
) 
sup
31
(x, x
1
), rsg
fb
(y
1
, x
1
)
rsg
bf
(x, y) 
sup
32
(x, y
1
), down(y
1
, y)
sup
41
(y, y
1
) 
input
_
rsg
fb
(y), down(y
1
, y)
sup
42
(y, x
1
) 
sup
41
(y, y
1
), rsg
bf
(y
1
, x
1
)
rsg
fb
(x, y) 
sup
42
(y, x
1
), up(x, x
1
)
input
_
rsg
bf
(x
1
) 
sup
31
(x, x
1
)
input
_
rsg
fb
(y
1
)
sup
41
(y, y
1
)
Seed
 
input
_
rsg
bf
(a) 
Query
 
query(y) 
rsg
bf
(a, y)
9/25/2024
24
QSQ at work
9/25/2024
25
rsg
bf
(x,y)
up(x,x
1
),  rsg
fb
(y
1
,x
1
), down(y
1
,y) 
sup
0
(x)   
sup
1
(x,x
1
)  
sup
2
(x,y
1
)   
sup
3
(x,y)
 
rsg
fb
(x,y)
down(y
1
,y), rsg
bf
(y
1
,x
1
), up(x,x
1
)
sup
0
(y)   
sup
1
(y,y
1
)  
sup
2
(y,x
1
)   
sup
3
(x,y)
rsg
bf
(x,y)
flat(x,y) 
sup
0
(x)   
sup
1
(x,y)
 
rsg
fb
(x,y)
flat(x,y)
sup
0
(y)  
 sup
1
(x,y)
 
input-rsg
bf 
                   input-rsg
fb
               ans-rsg
bf
                 ans-rsg
fb
a
a
a
a  e
a  f
e
f
e
f
e
f
g   f
g   f
a   g
a   b
a   b
Subqueries
rsg
fb
(y
1
,
e
)
rsg
fb
(y
1
,
f
)
SLD-resolution by example
9/25/2024
26
rsg(a,y)
  
 
            
rsg(x,y)
 ← up(x,x1), rsg(y1,x1), down(y1,y)
rup(a,x1)
, rsg(y1,x1), down(y1,y)
   
         
up(a,f)
rsg(y1,f)
, down(y1,y)
   
  
rsg(x,y)
 ← flat(x,y) 
flat(y1,f)
, down(y1,y)
   
                      
flat(g,f)
 
 down(g,y)
   
                                            
down(g,b)
query(y) 
    
            
query(y)
 ← rsg(a, y)
y=b is an answer
Datalog
¬
 by example
Accept negative literal in body
Complement of transitive closure
CompG(x,y) ← 
G(
x,y)
9/25/2024
27
More complicated
Some T
P
 are not monotone
Some T
P
 
have no fixpoint
containing I
P
1
 
= {p ←  ¬p}
 → {p} → 
 → {p} → …
Some T
P
 
have several minimal
fixpoints containing I
P
2
 
= {p ←  ¬q, q ←  ¬p}
Two minimal fixpoints:
 
{p} and {q}.
Some T
P
 
have a least fixpoint but
sequence diverges
P
3
 
= {p 
 ¬r ; r 
 ¬p; p 
 ¬p, r}
alternates between 
 and {p, r}
But {p} is a least fixpoint
Model semantics
Some programs have no model
containing I
Some program have several
minimal models containing
9/25/2024
28
First fix: stratification
 
Impose condition on the syntax
Stratified programs
 
 
 
 
 
 
Consider more complex semantics
Many such proposals
Well-founded semantics based on 3-valued logic
9/25/2024
29
datalog
datalog
datalog
datalog
Well-founded by example:
2-player game
move graph:
(relation 
K
)
There is a pebble in a node
2 players alternate playing
A player moves the pebble following an edge
A player who cannot move loses
9/25/2024
30
c
a
b
d
f
e
g
e, g are
loosing
Winning position
move graph:
(relation 
K
)
There is a pebble in a node
2 players alternate playing
A player moves the pebble following an edge
A player who cannot move loses
9/25/2024
31
c
a
b
d
f
e
g
d,f are
winning
No winner no looser
move graph:
(relation 
K
)
There is a pebble in a node
2 players alternate playing
A player moves the pebble following an edge
A player who cannot move loses
9/25/2024
32
c
a
b
d
f
e
g
1
1
2
2
a, b, c
unknown
Program to specify
the winning/loosing positions
win(x) ←  move(x, y),¬win(y)
Well-founded semantics: find the instance 
J
 that
 
agrees with K on move and
 
satisfies the formula corresponding to the rule
Instance J 
   
– three-valued instance
 
win(d),win(f )
  
are true
 
win(e),win(g)
  
are false
 
win(a),win(b),win(c)
 
are unknown
Fixpoint semantics  based on 3-valued logic
9/25/2024
33
Fixpoint computation
 
win(x) ←  move(x, y),¬win(y)
 
 
 
 
 
I
0
: win is always false
I
1
: win: a, b, c, d, f
I
2
: win: d, f
I
3
: win: a, b, c, d, f
I
4
: win: d, f
9/25/2024
34
c
a
b
d
f
e
g
Complexity and expressivity
Datalog and 
Datalog
¬ 
evaluations are  easy
Datalog
 Ptime
In the data
Inclusion in ptime: polynomial number of stages; each stage in ptime
Strict: Expresses only monotone queries;
But does not even express all PTIME monotone queries
Datalog
¬ 
 with well-founded semantics = fixpoint 
 Ptime 
In the data
On ordered databases, it is exactly PTIME
9/25/2024
35
Datalog revival
 
9/25/2024
36
Datalog revival
Datalog needs to be extended to be useful
 
Updates, value creation, nondeterminism
 
    
[e.g. A., Vianu]
 
Skolem 
  
[e.g. Gottlob]
 
Constraints
  
[e.g. Revesz]
 
Time 
  
[e.g. Chomicki]
 
Distribution 
 
[e.g. 
ActiveXML
]
 
Trees   
  
[e.g. 
ActiveXML
]
 
 
Aggregations 
 
[e.g. Consens and Mendelzon]
 
Delegation
  
[e.g.  
Webdamlog
]
9/25/2024
37
Datalog revival: different domains
Declarative networking 
  
[e.g.  Lou et al]
Data integration and exchange 
 
[e.g.  Clio, Orchestra]
Program verification 
   
[e.g.  Semmle]
Data extraction from the Web 
 
[e.g.  Gottlob, Lixto]
Knowledge representation
  
[e.g.  Gottlob]
Artifact and workflows 
  
[e.g.  ActiveXML] 
 
   
Web data management 
  
[e.g.  Webdamlog]
 
   
9/25/2024
38
Declarative networking
Traditional vs. declarative
Network state
  
Distributed database
Network protocol
  
Datalog program
Messages
   
Messages
Series of languages/systems from Hellerstein groups in Berkeley
Overlog, bloom, dedalus, bud…
Performance: scalability
Many systems have been developed
Internet routing
Overlay networks
Sensor networks
9/25/2024
39
Data integration
Eid, Name, Addr
  
( employee(Eid, Name, Addr) 
  
 Ssn ( name(Ssn, Name) 
 address(Ssn, Addr) ) )
Use “inverse” rules with Skolem
name(
ssn(Name, Addr
), Name) 
 
← employee(X, Name, Addr)
address(
ssn(Name, Addr)
, Addr) 
 
← employee(X, Name, Addr)
Possibly infinite chase and  issues with termination
9/25/2024
40
Program analysis
Analyze the possible runs of a program
Recursion
Lots of possible runs – lots of data
Optimization techniques are essential
Semi-naïve, Magic Sets, Typed-based optimization
9/25/2024
41
Data extraction
Georg’s talk next
9/25/2024
42
Conclusion
 
9/25/2024
43
Issues
Give precise semantics to the extensions
Some challenges for the Web
Scaling to large volumes
  
Berkeley’s works
Datalog with distribution
  
Datalog with uncertainty
Datalog with inconsistencies
9/25/2024
44
Webdamlog  
9/25/2024
45
9/25/2024
45
9/25/2024
45
Georg Gottlob,
Professor at 
Oxford
 University & 
TU Wien
Research:  database, AI, logic and complexity
Fellow of St John's and 
Ste Anne
’s College,
Oxford
Fellow:  ACM, ECCAI, Royal Society
Academy: Austrian, German, Europaea
Program chair: IJCAI, PODS…
Founder of Lixto, a company on web data
extraction
ERC Advanced Investigator's Grant (DIADEM)
9/25/2024
46
Slide Note
Embed
Share

Datalog, a logic-based programming language, saw a revival in the 21st century with the addition of recursion to positive first-order queries. The history of Datalog traces back to the 1970s with the idea of adding recursion to FO queries. Despite the industry's initial lack of interest, Datalog found a resurgence in recent years. The limitations of relational calculus and the concepts of terms, constants, and variables are also explored in this informative content.

  • Datalog Revival
  • Logic Programming
  • Recursion
  • Relational Calculus
  • Database Systems

Uploaded on Sep 25, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Datalog Revival Serge Abiteboul INRIA Saclay, Coll ge de France, ENS Cachan 9/25/2024 9/25/2024 9/25/2024 1 1 1

  2. FO+ Datalog history loop FO datalog Started in 77: logic and database workshop Simple idea: add recursion to positive FO queries Blooming in the 80th Logic programming was hot Industry was not interested: No practical applications of recursive query theory have been found to date. Hellerstein and Stonebraker (Readings in DB Systems) Quasi dead except local resistance [e.g., A.,Gottlob] datalog Revival in this century 9/25/2024 2

  3. Organization Datalog Datalog evaluation Datalog with negation Datalog revival Conclusion 9/25/2024 3

  4. Datalog 9/25/2024 4

  5. Limitation of relational calculus G a graph: G(0,1), G(1,2), G(2,3), Is there a path from 0 to 11 in the graph? G(10,11) 0 1 4 5 8 9 2 3 6 7 10 11 k-path x1 xk( G(0,x1) G(x1,x2) G(xk-1,xk) G(xk,11) ) Path of unbounded length: infinite formula k=1 to k-path 9/25/2024 5

  6. Term = constant or variable Datalog program = set of datalog rules Datalog G(2,3) T(x, y) G(x, z), T(z, y) fact rule datalog rule : R1(u1) R2(u2), . . . ,Rn(un) for n 1 head body Each uiis a vector of terms Safe: each variable occurring in head must occur in body Intentional relation: occurs in the head Extensional relation: does not 9/25/2024 6

  7. Datalog program 1. 2. 3. 4. G(0,1), G(1,2), G(2,3), G(10,11) T(x, y) G(x, y) T(x, y) G(x, z), T(z, y) Ok() T(0, 11) edb(P) = {G} idb(P) = {T,Ok} program P 9/25/2024 7

  8. Datalog program 1. 2. 3. 4. G(0,1), G(1,2), G(2,3), G(10,11) T(x, y) G(x, y) T(x, y) G(x, z), T(z, y) Ok() T(0, 11) T(10, 11) G(10, 11) T(9, 11) G(9,10), T(10, 11) T(0, 11) G(0,1), T(1, 11) Ok() T(0, 11) T(10,11) T(9,11) Rule 2: v(x)=10 & v(y) = 11 Rule 3: v(x)=9, v(z)=10 & v(y)=11 Rule 3: v(x)=0, v(z)=1 & v(y)=11 Rule 4: v(x)=0, v(y)=11 T(0,11) Ok() 9/25/2024 8

  9. Model semantics View P as a first-order sentence P describing the answer Associate a formula to each rule R1(u1) R2(u2), . . . ,Rn(un) : x1, . . . , xm( R2(u2) . . . Rn(un) R1(u1) ) where x1, . . . , xmare the variables occurring in the rule P = {r1, ..., rn}, P= r1 ... rn The semantics of P for a database I, denoted P(I), is the minimum model of Pcontaining I Does it always exist? How can it be computed? 9/25/2024 9

  10. Example: Transitive closure G(0,1), G(1,2), G(2,3) T(x,y) G(x,y) T(x,y) G(x,z), T(z,y) G ---- 0 1 0 1 1 2 1 2 2 3 2 3 P G ---- 0 1 0 1 1 2 1 2 2 3 2 3 P G ---- 0 1 0 1 1 2 1 2 2 3 2 3 P ---- G ---- 0 1 0 1 1 2 1 2 P ---- ---- ---- 0 2 1 3 0 3 6 3 0 2 0 2 1 3 0 2 1 3 0 3 Minimum model containing I Does not contain I Not a model of the formula Model but not minimal 9/25/2024 10

  11. Existence of P(I) There exists at least one such model: the largest instance one can build with the constants occurring in I and P is a model of P that includes I B(I,P) P(I) always exists: it is the intersection of all models of P that include I over the constants occurring in I and P How can it be computed? 9/25/2024 11

  12. Fixpoint semantics A fact A is an immediate consequence for K and P if 1. A is an extensional fact in K, or 2. for some instantiation A A1, . . . , Anof a rule in P, each Aiis in K Immediate consequence operator: TP(K) = { immediate consequences for K and P } Note: TP is monotone 9/25/2024 12

  13. Fixpoint semantics continued P(I) is a fixpoint of TP That is: TP(P(I)) P(I) Indeed, P(I) is the least fixpoint of TP containing I Yields a means of computing P(I) I TP(I) TP2(I) ... Tpi(I) = Tpi+1(I)= P(I) B(I,P) 9/25/2024 13

  14. Proof theory Proof technique: SLD resolution A fact A is in P(I) iff there exists a proof of A 9/25/2024 14

  15. Static analysis Hard Deciding containment (P P ) is undecidable Deciding equivalence is undecidable Deciding boundedness is undecidable There exists k such that for any I, the fixpoint converges in less than k stages So, optimization is hard 9/25/2024 15

  16. Datalog evaluation by example 9/25/2024 16

  17. More complicated example: Reverse same generation up a a f g h i j flat g m m p down l m g h i p e f m n n o o f n o m f f b c d k rsg(x,y) flat(x,y) rsg(x,y) up(x,x1),rsg(y1,x1),down(y1,y) 9/25/2024 17

  18. rsg(x,y) flat(x,y) rsg(x,y) up(x,x1),rsg(y1,x1),down(y1,y) f g f m n m o p m a b h f i j f f k a c a d f f l m n o p d u d u u u u d f f e f g h i j k rsg d u u d d a b c d rsg 9/25/2024 18

  19. Naive algorithm Fixpoint rsg0= rsgi+1= flat rsgi 16( 2=4( 3=5(up rsgii down))) Program rsg := ; repeat rsg := flat rsg 16( 2=4( 3=5(up rsg down))) until fixpoint 9/25/2024 19

  20. Semi-naive 1(x, y) flat(x, y) i+1(x, y) up(x, x1), i(y1, x1), down(y1, y) Compute I Program Converges to the answer Not recursive & not a datalog program Still redundant to avoid it: i+1(x, y) up(x, x1), i(y1, x1), down(y1, y), i(x, y) 9/25/2024 20

  21. rsg(x,y) flat(x,y) rsg(x,y) up(x,x1),rsg(y1,x1),down(y1,y) f g f m n m o p m a b h f i j f f k a c a d f 1 f l m n o p d u d u u u u d f f e f g h i j k 2 d u u d d a b c d 3 9/25/2024 21

  22. Semi-nave (end) More complicated if the rules are not linear T(x, y) G(x, y) T(x, y) T(x, z), T(z, y) 1(x, y) G(x, y) anc1:= 1 tempi+1(x, y) i(x, z), anci(z, y) tempi+1(x, y) anci(x, z), i(z, y) i+1 := tempi+1 anci anci+1 := anci i+1 9/25/2024 22

  23. And beyond Start from a program and a query rsg(x,y) flat(x,y) rsg(x,y) up(x,x1),rsg(y1,x1),down(y1,y) query(y) rsg(a, y) Optimize to avoid deriving useless facts Two competing techniques that are roughly equivalent Query-Sub-Query Magic Sets 9/25/2024 23

  24. Magic Set rsgbf(x, y) input_rsgbf(x), flat(x, y) rsgfb(x, y) input_rsgfb(y), flat(x, y) sup31(x, x1) input_rsgbf(x), up(x, x1) sup32(x, y1) sup31(x, x1), rsgfb(y1, x1) rsgbf(x, y) sup32(x, y1), down(y1, y) sup41(y, y1) input_rsgfb(y), down(y1, y) sup42(y, x1) sup41(y, y1), rsgbf(y1, x1) rsgfb(x, y) sup42(y, x1), up(x, x1) input_rsgbf(x1) sup31(x, x1) input_rsgfb(y1) sup41(y, y1) Seed input_rsgbf(a) Query query(y) rsgbf(a, y) 9/25/2024 24

  25. QSQ at work Subqueries rsgfb(y1,e) rsgfb(y1,f) rsgbf(x,y) rsgbf(x,y) up(x,x1), rsgfb(y1,x1), down(y1,y) flat(x,y) sup0(x) sup1(x,y) sup0(x) sup1(x,x1) sup2(x,y1) sup3(x,y) a a e a f a g a b a rsgfb(x,y) rsgfb(x,y) flat(x,y) down(y1,y), rsgbf(y1,x1), up(x,x1) sup0(y) sup1(x,y) e f sup0(y) sup1(y,y1) sup2(y,x1) sup3(x,y) g f e f input-rsgbf input-rsgfb ans-rsgbf ans-rsgfb a e f a b g f 9/25/2024 25

  26. Datalogby example Accept negative literal in body Complement of transitive closure CompG(x,y) G(x,y) 9/25/2024 27

  27. More complicated Some TPhave a least fixpoint but sequence diverges P3= {p r ; r p; p p, r} alternates between and {p, r} But {p} is a least fixpoint Some TPare not monotone Some TPhave no fixpoint containing I P1= {p p} {p} {p} Model semantics Some programs have no model containing I Some program have several minimal models containing Some TPhave several minimal fixpoints containing I P2= {p q, q p} Two minimal fixpoints: {p} and {q}. 9/25/2024 28

  28. First fix: stratification Impose condition on the syntax Stratified programs datalog datalog datalog datalog Consider more complex semantics Many such proposals Well-founded semantics based on 3-valued logic 9/25/2024 29

  29. e, g are loosing Well-founded by example: 2-player game c b e move graph: (relation K) a d f g There is a pebble in a node 2 players alternate playing A player moves the pebble following an edge A player who cannot move loses 9/25/2024 30

  30. d,f are winning Winning position c b e move graph: (relation K) a d f g There is a pebble in a node 2 players alternate playing A player moves the pebble following an edge A player who cannot move loses 9/25/2024 31

  31. a, b, c unknown No winner no looser c 2 b 1 e move graph: (relation K) a 1 2 d f g There is a pebble in a node 2 players alternate playing A player moves the pebble following an edge A player who cannot move loses 9/25/2024 32

  32. Program to specify the winning/loosing positions win(x) move(x, y), win(y) Well-founded semantics: find the instance J that agrees with K on move and satisfies the formula corresponding to the rule Instance J win(d),win(f ) win(e),win(g) win(a),win(b),win(c) three-valued instance are true are false are unknown Fixpoint semantics based on 3-valued logic 9/25/2024 33

  33. Fixpoint computation win(x) move(x, y), win(y) e c b No maybe g a d f yes I0: win is always false I1: win: a, b, c, d, f I2: win: d, f I3: win: a, b, c, d, f I4: win: d, f 9/25/2024 34

  34. Complexity and expressivity Datalog and Datalog evaluations are easy Datalog Ptime In the data Inclusion in ptime: polynomial number of stages; each stage in ptime Strict: Expresses only monotone queries; But does not even express all PTIME monotone queries Datalog with well-founded semantics = fixpoint Ptime In the data On ordered databases, it is exactly PTIME 9/25/2024 35

  35. Datalog revival 9/25/2024 36

  36. Datalog revival Datalog needs to be extended to be useful Updates, value creation, nondeterminism [e.g. A., Vianu] [e.g. Gottlob] [e.g. Revesz] [e.g. Chomicki] [e.g. ActiveXML] [e.g. ActiveXML] [e.g. Consens and Mendelzon] [e.g. Webdamlog] Skolem Constraints Time Distribution Trees Aggregations Delegation 9/25/2024 37

  37. Datalog revival: different domains Declarative networking Data integration and exchange Program verification Data extraction from the Web Knowledge representation [e.g. Lou et al] [e.g. Clio, Orchestra] [e.g. Semmle] [e.g. Gottlob, Lixto] [e.g. Gottlob] Artifact and workflows Web data management [e.g. ActiveXML] [e.g. Webdamlog] 9/25/2024 38

  38. Declarative networking Traditional vs. declarative Network state Network protocol Messages Series of languages/systems from Hellerstein groups in Berkeley Overlog, bloom, dedalus, bud Performance: scalability Many systems have been developed Internet routing Overlay networks Sensor networks Distributed database Datalog program Messages 9/25/2024 39

  39. Data integration Eid, Name, Addr ( employee(Eid, Name, Addr) Ssn ( name(Ssn, Name) address(Ssn, Addr) ) ) Use inverse rules with Skolem name(ssn(Name, Addr), Name) address(ssn(Name, Addr), Addr) employee(X, Name, Addr) employee(X, Name, Addr) Possibly infinite chase and issues with termination 9/25/2024 40

  40. Program analysis Analyze the possible runs of a program Recursion Lots of possible runs lots of data Optimization techniques are essential Semi-na ve, Magic Sets, Typed-based optimization 9/25/2024 41

  41. Data extraction Georg s talk next 9/25/2024 42

  42. Conclusion 9/25/2024 43

  43. Issues Give precise semantics to the extensions Some challenges for the Web Scaling to large volumes Datalog with distribution Datalog with uncertainty Datalog with inconsistencies Berkeley s works Webdamlog 9/25/2024 44

  44. Merci ! 9/25/2024 9/25/2024 9/25/2024 45 45 45

  45. Georg Gottlob, Professor at Oxford University & TU Wien Research: database, AI, logic and complexity Fellow of St John's and Ste Anne s College, Oxford Fellow: ACM, ECCAI, Royal Society Academy: Austrian, German, Europaea Program chair: IJCAI, PODS Founder of Lixto, a company on web data extraction ERC Advanced Investigator's Grant (DIADEM) 9/25/2024 46

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#