Active Documents and Active XML: Modeling Data-Intensive Distributed Systems

Active documents and Active XML
Serge Abiteboul
INRIA Saclay, Collège de France, ENS Cachan
9/30/2024
1
9/30/2024
1
Organization
Introduction
Modeling data intensive distributed systems
Query optimization in distributed systems
Monitoring in distributed systems
Task sequencing in distributed systems
Conclusion
2
Introduction
 
Context: Web data management
Scale: lots of servers, large volume of data
Servers are autonomous (heterogeneous also)
Data may be very dynamic, heavy update rates
Peers are possibly moving
 
4
 
 The focus in this class
The lesson from the past
The success of the relational model with  
2D-tables on local
servers
A logic for defining tables
An algebra for describing query plans over tables
We should do similarly for 
trees in a distributed environment
A logic for defining distributed trees and data services
An algebra for 
optimizing queries
 
over trees/services
5
Roadmap
1.
Modeling: the AXML model of active documents
Views
: to capture intentional data
Streams
: to capture exchanges of data and evolution
1.
Optimization: an algebra for AXML
2.
Monitoring: based on AXML documents
3.
Task sequencing: A workflow based on AXML documents
In the spirit of business artifacts
6
Key concept for
Data management
Key concept for
distribution and
evolution
Modeling data intensive distributed systems
Active XML
Active XML
 
Based on Web standards:
  
XML + Web services + Xpath/Xquery
Idea: E
xchange XML documents with embedded function calls
 
             
XML: Unordered, unranked, labeled                    trees
Internal nodes are labeled by tags
Leaves are labeled by tags, data
Set semantics:  No isomorphic sibling sub-trees
 
The functions are interpreted as calls to external services
Embedding calls in data is an old idea in databases
 
 
8
a
b
c
d
b
d
 
Active                                                           
        ,
  evolving
 
, or 
function symbols
Example
9
t
t2     m2     
root@p1
!songs@p2
!songs@p3
t
t1      m1    
t
t3      m3     !f3
songs
 
Leads to evolving trees
Intentional data: get the data only when desired
Dynamic data: If data sources change, the document changes
Flexible data: adapt to the needs
Function in push & pull mode
 
!songs@p1
t
 
t4    m4
t
 
t5    m5     !f5
Query root/songs/t
10
t
t2     m2     
root@p1
!songs@p2
!songs@p3
t
t1      m1    
t
t3      m3     !f3
songs
 
!songs@p1
t
 
t4    m4
t
 
t5    m5     !f5
 
Recursive calls
root//t[//singer/“Brel”]
11
t
t2     m2     
root@p1
!songs@p2
!songs@p3
t
t1      m1    
t
t3      m3     !f3
songs
 
!songs@p1
t
 
t4    m4
Push queries to data sources
!songs@p3: root//t[//singer/“Brel”]
!songs@p2 root//t[//singer/“Brel”]
!songs@p1: root//t[//singer/“Brel”]
Distributed query/subquery (or Magic Set)
t
This is 
distributed 
datalog 
over trees
songs@p1(x,y) 
 
:- t@p1(x,y)
songs@p1(x,y)
 
:- songs@p2(x,y)
songs@p1(x,y)
 
:- songs@p3(x,y)
 
songs@p2x,y) 
 
:- t@p1(x,y)
songs@p2(x,y)
 
:- songs@p1(x,y)
songs@p2(x,y)
 
:- songs@p3(x,y)
 
songs@p3(x,y) 
 
:- t@p1(x,y)
songs@p3(x,y)
 
:- songs@p1(x,y)
songs@p3(x,y)
 
:- songs@p2(x,y)
 
     
12
 
:- songs@p1(x, y), P(x)
 
:- songs@p2(x, y), P(x)
 
:- songs@p3(x, y), P(x)
 
:- songs@p1(x, y), P(x)
 
:- songs@p2(x, y), P(x)
 
:- songs@p1(x, y), P(x)
Fun issues: The semantics of calls
When to activate the call?
Explicit pull mode: active databases
Implicit pull mode: deductive databases
Push mode: query subscription
What to do with its result?
How long is the returned data valid?
Sending an AXML documents: evaluate the service
calls before sending or not?
13
Exchanging AXML data
Web services exchange intentional documents
Materialization can be performed
by the sender, before sending a document or
by the receiver, after receiving it.
14
 
Le Monde
06/10/2003
 
Transfer
 
Matisse...
Matisse...
 
Matisse...
Matisse...
 
Matisse...
Matisse...
 
Transfer
Exchanging AXML data
Web services exchange intentional documents
Materialization can be performed
by the sender, before sending a document or
by the receiver, after receiving it.
 
Le Monde
06/10/2003
 
Matisse...
Matisse...
 
Matisse...
Matisse...
15
Some reasons for 
not
 materializing data
before sending the document
Freshness
The receiver will get up-to-date information when needed
S
ecurity
Only the receiver has the credential to call the service
One needs to record who is actually using the data
Performance
To save on the bandwidth of the sender
To delegate work to someone else
How to specify it: casting based on types 
 jewel section
16
Complex issues
Brings to a unique setting
 
distributed db
 
deductive db
 
active db
    
 
stream data
 
warehousing & mediation
This seems to us necessary for capturing all the facets of data
management in distributed systems
This is unreasonable? Yes!
17
Query optimization in distributed systems
Active XML Algebra optimization
AXML 
system
A 
system
 = a set of peers
Each peer provides storage and query
processing
Each peer hosts active documents
Extensional data
Intentional data (query calls in the document)
Problem:
 
Given a query 
q
 at some peer
 
evaluate the answer to q with
 
optimal response time
Query
processor
Optimizer
P
e
e
r
Communication
AXML 
docs
Stats
Workspace
19
Local and global query processing
Local processing
Input/output streams
Local query optimization
Global processing
Streams for communications
Global query optimization
Delegate work to other peers
20
Example 1: Local and global optimization
P
e
e
r
 
1
rcv
1
rcv
2
σ
snd
1
R
P
e
e
r
 
3
P
e
e
r
 
2
snd
2
S
p3 asks for 
 ( R@p1 
 S@p2 )
21
Example 2: MapReduce
P
e
e
r
 
1
rcv
1
rcv
2
snd
1
R
P
e
e
r
 
3
P
e
e
r
 
2
snd
2
S
22
The Active XML algebra
Passive nodes
Annotated with labels 
 
q
root
a
b
Query nodes
 
Annotated with queries
 
For instance Tree-Pattern-Queries
 
Send/Receive 
nodes
 
Annotated with channel 
ids
 
snd
2
rcv
2
rcv
1
channel
snd
2
snd
2
rcv
2
rcv
2
channel
rcv
1
rcv
1
Input
Internal channel 
Input channel (no snd) 
23
Evolution of a system
A system evolves by activating:
a query node
a send/receive node on an internal channel
a receive node on an input channel
24
Equivalence problem for AXML systems
Complexity increases with:
richer query language
the presence of input
Axiomatization of equivalence in absence of queries
25
Optimization
As usual
Use algebraic rewriting rules
Use simplistic estimators for query plans
Use heuristics to prune the search space
26
Examples of performance optimization
techniques
Externalize
 data in devices with limited capabilities
Cell phone, tablets, home appliances…
Limited storage space, computational power, network bandwidth
Replicate
 
documents and services
To allow for 
local
 computation
To increase parallelism
27
27
 
Externalize
and replication
28
Monitoring in distributed systems
The Axlog system
Monitoring distributed systems
Distributed applications are often very dynamic
Content change rapidly
Intense communications
Peers sometimes come and leave
Complex and hard to control such systems
Many peers
Peers are distributed & autonomous
Peers are sometimes unreliable and selfish
Goal: 
monitor
 
such systems
30
          Architecture
31
publishers
Alerters
Streams
Stream
processors
 
actions
 
RSS
 
Axlog
processor
Axlog principle = active document & query
Incoming streams of updates
The outgoing stream is defined
by a query Q (e.g. TPQ)
Each time an incoming
message arrives, it modifies
the document so possibly
the query result
The output stream specifies
how the view is modified
Incremental view maintenance
Query
AXML document
 
Updates
32
Axlog engine
Datalog is used to evaluate queries with benefit from
Incremental view maintenance in datalog 
  
Δ
 
technique
Query optimization in datalog 
   
MagicSet
Constraint query languages 
   
CQL
Specific techniques
Push queries to the sources to avoid loading irrelevant data
Use of FSA on XML inputs: YFilter
33
Task sequencing in distributed systems
 
Task sequencing and verification
Task sequencing is a major difficulty for
distributed systems
Difficulty to integrate 
workflow
 and 
database
systems
Verification of temporal properties is hard
Typically verification is harder than evaluation
Evaluating an FO query is ptime data complexity
Verifying that Q 
 Q’ is undecidable
Verification will be the topic of the seminar by
 
Victor Vianu
35
 DBMSs exchanging data
Workflow systems
sequencing tasks
Example:
Dell Supply Chain
Customer
Web Store
Bank
Plant
Warehouse
Shipping
Supplier
36
AXML as 
business artifacts
Concept introduced by IBM
[Nigam & Caswell 03, Hull & Su 07]
Data-centric workflows
A process is described by a document
(possibly moving in the enterprise)
The behavior of an artifact is specified
by some constraints on its evolution
Vs. state-transition-based
workflows
Based on some form of state transition
diagrams (BPEL, Petri,…)
Mostly ignore data
w
e
b
O
r
d
e
r
 
i
d
=
7
7
8
7
7
8
0
Customer
 
Name: John Doe
 
Address: Sèvres
Product: 
committed
 
 Ref: PC 456
Factory: Milano
Parts: 
waiting
orderDate: 2009/07/24
Site: http:// d555.com
Payment: 
done
 
Bank-account …
Delivery: 
not-active
 
37
Axml Artifacts move between peers
w
e
b
O
r
d
e
r
 
i
d
=
7
7
8
7
7
8
0
Customer
 
Name: John Doe
 
Address: Sèvres
O
r
d
e
r
 
s
e
l
e
c
t
i
o
n
:
 
o
n
-
g
o
i
n
g
 
Ref: PC 456
Factory: 
undecided
Parts: 
not-active
orderDate: 2009/07/24
Site: http://d555.com
Payment: 
pending
Delivery: 
not-active
 
w
e
b
O
r
d
e
r
 
i
d
=
7
7
8
7
7
8
0
Customer
 
Name: John Doe
 
Address: Sèvres
Order selection : 
committed
 
 Ref: PC 456
Factory: Milano
P
a
r
t
s
:
 
o
n
-
g
o
i
n
g
orderDate: 2009/07/24
Site: http:// d555.com
Payment: 
done
 
Bank-account …
Delivery: 
not-active
 
w
e
b
O
r
d
e
r
 
i
d
=
7
7
8
7
7
8
0
Customer
 
Name: John Doe
 
Address: Sèvres
Order selection : 
committed
 
 Ref: PC 456
Factory: Milano
Parts: 
done
orderDate: 2009/07/24
Site: http:// d555.com
Payment: 
done
      
Bank-account: CEIF-4457889
D
e
l
i
v
e
r
y
:
 
o
n
-
g
o
i
n
g
      A
ddress: Orsay
In webStore
 
     
  
In plant
  
          In delivery
38
catalogue
WEBSTORE
PLANT
DELIVERY
CREDIT APPROVAL
WAREHOUSE
ARCHIVE
39
Sequencing of operations
Different ways of expressing sequencing of tasks
Guards: preconditions for function calls
Transition-based diagrams
Formulas in temporal logic
Study how they can simulate each other using some “scratch
paper”
40
A jewel of active documents
Casting document to a target type
The casting problem
Given
An active document I
The signature of the functions
And a target type T
Which functions to call to be sure to reach T?
2-player game
Juliet chooses which function to call
Romeo chooses a value within the domain of the
function
Juliet wins if she can reach a document in T
42
An abstraction: active context-free games
On words instead of trees
Game (𝚺,R,T)
𝚺 is a finite alphabet
R set of CF rules
T is a regular target language
w is the start word
Output: true if Juliet has a winning strategy
Alternation of
 
 states (Juliet pick next function to call) and
 
 states (the adversary Romeo picks the answer)
43
Examples
Winning
Start word aba
Strategy
Call the second a
Call all the c’s
Obtain a word in Target
Losing
Start word ab
No strategy
Initially 
 
#(a) – #(b) = 0
If I call a or b, #(a) – #(b) < 0
44
a→abc*; b→(ba)*b; c→ab
Target abab(ab)*
Fun rewriting game
The problem is undecidable in general
Interesting decidable subcases
MuschollSchwentickSegoufin
Juliet has to traverse the string from left to right
No recursion among function calls
Function call are “linear”
Also in practice, very efficient casting based on unambiguous
grammars
45
Conclusion
 
Some works around Axml
The Axml  system – open-source  (on server, on smartphone)
The useful: Replication and query optimization
 
How to evaluate a query efficiently by taking advantage of replication
The useful: Lazy query evaluation
How to evaluate a query without calling all embedded services
 
The fun: Casting problem
Which functions to call to “match” a target type
Active context-free games
The exotic
Diagnosis of communication systems based on datalog optimization
Access control
Distributed design
Probabilistic generation of documents
47
We will come back to distribution
Lesson 6: datalog
 
- recursion is essential
Lesson 7: distributed data management in general
Lesson 8: distributed knowledge bases
48
Acknowledgements
With many colleagues, in particular:
Tova Milo (Tel Aviv) 
  
Victor Vianu (UCSD)
Luc Segoufin (INRIA)
  
Ioana Manolescu (INRIA)
Georg Gottlob (Oxford) 
 
Alkis Polyzotis (UCSC)
Angela Bonifati (Lille)
 
Marie-Christine Rousset (Grenoble)
Balder ten Catte (UCSC)
 
Yannis Katsis (UCSD)
And PhD students
Omar Benjelloun (Google)
 
Bogdan Marinoiu (SAP)
Pierre Bourhis (INRIA)
 
Alban Galland (INRIA)
Marco Manna (Calabria)
 
Nicoleta Preda (Versailles)
Zoe Abrams (Google)
 
Emmanuel Taropa (Google)
Bogdan Cautis (Telecom) 
 
Spyros Zoupanos (Max-Planck-Institut)
And others
49
9/30/2024
50
 
Static Analysis and Verification 
Victor Vianu, U.C. San Diego
PhD from USC 1983
Sabbaticals INRIA, ENS Cachan, Ulm,
Telecom
Interests:  database theory,
computational logic, Web data
Co-author of 
Foundations of databases
Aka the Alice book
Vianu has served as
General Chair of SIGMOD, PODS,
Program Chair of PODS, ICDT
Editor-in-Chief of the J. ACM
ACM Fellow
51
Slide Note
Embed
Share

Explore the world of active documents and Active XML in managing data-intensive distributed systems. Dive into topics such as query optimization, monitoring, task sequencing, and more. Discover the importance of modeling, optimization, and monitoring in this evolving landscape. Uncover key concepts, roadmaps, and lessons from the past to enhance your understanding of web data management and distribution.

  • Active Documents
  • Active XML
  • Data Management
  • Distributed Systems
  • Query Optimization

Uploaded on Sep 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Active documents and Active XML Serge Abiteboul INRIA Saclay, Coll ge de France, ENS Cachan 9/30/2024 9/30/2024 1 1

  2. Organization Introduction Modeling data intensive distributed systems Query optimization in distributed systems Monitoring in distributed systems Task sequencing in distributed systems Conclusion 2

  3. Introduction

  4. Context: Web data management Scale: lots of servers, large volume of data Servers are autonomous (heterogeneous also) Data may be very dynamic, heavy update rates Peers are possibly moving The focus in this class Relation Tree Centralized Distributed Precise data Incomplete, probabilistic Precise schemas Ontologies 4

  5. The lesson from the past The success of the relational model with 2D-tables on local servers A logic for defining tables An algebra for describing query plans over tables We should do similarly for trees in a distributed environment A logic for defining distributed trees and data services An algebra for optimizing queries over trees/services 5

  6. Roadmap 1. Modeling: the AXML model of active documents Key concept for Data management Views: to capture intentional data Streams: to capture exchanges of data and evolution Key concept for distribution and evolution 1. Optimization: an algebra for AXML 2. Monitoring: based on AXML documents 3. Task sequencing: A workflow based on AXML documents In the spirit of business artifacts 6

  7. Modeling data intensive distributed systems Active XML

  8. Active XML Based on Web standards: XML + Web services + Xpath/Xquery Idea: Exchange XML documents with embedded function calls XML: Unordered, unranked, labeled trees Internal nodes are labeled by tags Leaves are labeled by tags, data Set semantics: No isomorphic sibling sub-trees Active , evolving a , or function symbols b b d c d The functions are interpreted as calls to external services Embedding calls in data is an old idea in databases 8

  9. Example root@p1 songs t t t t t !songs@p2 !songs@p3 !songs@p1 t1 m1 t4 m4 t3 m3 !f3 t2 m2 t5 m5 !f5 Leads to evolving trees Intentional data: get the data only when desired Dynamic data: If data sources change, the document changes Flexible data: adapt to the needs Function in push & pull mode 9

  10. Query root/songs/t root@p1 songs t t t t t t t t !songs@p2 !songs@p3 !songs@p1 t1 m1 t4 m4 t3 m3 !f3 t2 m2 t5 m5 !f5 Recursive calls 10

  11. root//t[//singer/Brel] root@p1 songs t t t t t !songs@p2 !songs@p3 !songs@p1 t1 m1 t4 m4 t3 m3 !f3 t2 m2 Push queries to data sources !songs@p3: root//t[//singer/ Brel ] !songs@p2 root//t[//singer/ Brel ] !songs@p1: root//t[//singer/ Brel ] Distributed query/subquery (or Magic Set) 11

  12. This is distributed datalog over trees :- songs@p1(x, y), P(x) songs@p1(x,y) songs@p1(x,y) songs@p1(x,y) :- t@p1(x,y) :- songs@p2(x,y) :- songs@p3(x,y) :- songs@p2(x, y), P(x) :- songs@p1(x, y), P(x) songs@p2x,y) songs@p2(x,y) songs@p2(x,y) :- t@p1(x,y) :- songs@p1(x,y) :- songs@p3(x,y) :- songs@p2(x, y), P(x) :- songs@p3(x, y), P(x) songs@p3(x,y) songs@p3(x,y) songs@p3(x,y) :- t@p1(x,y) :- songs@p1(x,y) :- songs@p2(x,y) :- songs@p1(x, y), P(x) 12

  13. Fun issues: The semantics of calls When to activate the call? Explicit pull mode: active databases Implicit pull mode: deductive databases Push mode: query subscription What to do with its result? How long is the returned data valid? Sending an AXML documents: evaluate the service calls before sending or not? 13

  14. Exchanging AXML data newspaper Matisse... GetEvents GetTemp title date Transfer Exhibits 06/10/2003 city Le Monde Paris Matisse... Matisse... Web services exchange intentional documents Materialization can be performed by the sender, before sending a document or by the receiver, after receiving it. 14

  15. Exchanging AXML data newspaper GetEvents GetTemp title date Transfer Exhibits 06/10/2003 city Le Monde Paris Matisse... Matisse... Web services exchange intentional documents Materialization can be performed by the sender, before sending a document or by the receiver, after receiving it. 15

  16. Some reasons for not materializing data before sending the document Freshness The receiver will get up-to-date information when needed Security Only the receiver has the credential to call the service One needs to record who is actually using the data Performance To save on the bandwidth of the sender To delegate work to someone else How to specify it: casting based on types jewel section 16

  17. Complex issues Brings to a unique setting distributed db deductive db active db stream data warehousing & mediation This seems to us necessary for capturing all the facets of data management in distributed systems This is unreasonable? Yes! 17

  18. Query optimization in distributed systems Active XML Algebra optimization

  19. AXML system A system = a set of peers Each peer provides storage and query processing Each peer hosts active documents Extensional data Intentional data (query calls in the document) Problem: Given a query q at some peer evaluate the answer to q with optimal response time Communication Optimizer Query processor Stats Workspace AXML docs Peer 19

  20. Local and global query processing Local processing Input/output streams Local query optimization Global processing Streams for communications Global query optimization Delegate work to other peers output stream p1 p2 p3 p4 input stream input stream 20

  21. Example 1: Local and global optimization p3 asks for ( R@p1 S@p2 ) Global Rewriting: Push selections to sources Local Rewriting: Selection & Union commute rcv1 rcv2 Peer 3 rcv1 rcv2 rcv1 rcv2 Peer 3 Peer 3 snd1 snd2 snd1 snd2 snd1 snd2 R S R S R S Peer 1 Peer 2 Peer 1 Peer 2 Peer 1 Peer 2 21

  22. Example 2: MapReduce rcv5 Peer 3 snd5 snd5 rcv1 rcv2 Peer 3 Middle- ware 1 Middle- ware 2 rcv1 rcv3 rcv2 rcv4 snd1 snd2 snd1 Sn2 snd3 snd4 map map R S Peer 1 Peer 2 Peer 1 Peer 2 R R 22

  23. The Active XML algebra root a b Passive nodes Annotated with labels q Query nodes root Annotated with queries snd2 b For instance Tree-Pattern-Queries rcv2 a q rcv2 rcv1 snd2 Send/Receive nodes rcv1 rcv2 a b Annotated with channel ids Internal channel Input channel (no snd) rcv2 rcv1 snd2 channel channel Input rcv2 rcv1 snd2 23

  24. Evolution of a system A system evolves by activating: a query node a send/receive node on an internal channel a receive node on an input channel 24

  25. Equivalence problem for AXML systems No query TPQ TPQ with XPath joins TPQ with joins TPQ with constructor No input PTIME PTIME PTIME Hard Undecidable Input PTIME Hard Hard ? Undecidable Complexity increases with: richer query language the presence of input Axiomatization of equivalence in absence of queries 25

  26. Optimization As usual Use algebraic rewriting rules Use simplistic estimators for query plans Use heuristics to prune the search space 26

  27. Examples of performance optimization techniques Externalize data in devices with limited capabilities Cell phone, tablets, home appliances Limited storage space, computational power, network bandwidth Replicate documents and services To allow for local computation To increase parallelism 27 27

  28. Externalize and replication 28

  29. Monitoring in distributed systems The Axlog system

  30. Monitoring distributed systems Distributed applications are often very dynamic Content change rapidly Intense communications Peers sometimes come and leave Complex and hard to control such systems Many peers Peers are distributed & autonomous Peers are sometimes unreliable and selfish Goal: monitor such systems 30

  31. Architecture Axlog processor Alerters Streams actions Stream processors publishers RSS 31

  32. Axlog principle = active document & query Incoming streams of updates The outgoing stream is defined by a query Q (e.g. TPQ) Each time an incoming message arrives, it modifies the document so possibly the query result The output stream specifies how the view is modified Query Updates AXML document Incremental view maintenance 32

  33. Axlog engine Datalog is used to evaluate queries with benefit from Incremental view maintenance in datalog Query optimization in datalog Constraint query languages Specific techniques Push queries to the sources to avoid loading irrelevant data Use of FSA on XML inputs: YFilter technique MagicSet CQL 33

  34. Task sequencing in distributed systems

  35. Task sequencing and verification Task sequencing is a major difficulty for distributed systems Difficulty to integrate workflow and database systems Verification of temporal properties is hard Typically verification is harder than evaluation Evaluating an FO query is ptime data complexity Verifying that Q Q is undecidable Verification will be the topic of the seminar by Victor Vianu DBMSs exchanging data Workflow systems sequencing tasks 35

  36. Example: Dell Supply Chain Warehouse Supplier Shipping Plant Customer Bank Web Store 36

  37. AXML as business artifacts Concept introduced by IBM [Nigam & Caswell 03, Hull & Su 07] webOrder id=7787780 Customer Name: John Doe Address: S vres Product: committed Ref: PC 456 Factory: Milano Parts: waiting orderDate: 2009/07/24 Site: http:// d555.com Payment: done Bank-account Delivery: not-active Data-centric workflows A process is described by a document (possibly moving in the enterprise) The behavior of an artifact is specified by some constraints on its evolution Vs. state-transition-based workflows Based on some form of state transition diagrams (BPEL, Petri, ) Mostly ignore data 37

  38. Axml Artifacts move between peers In webStore In plant In delivery webOrder id=7787780 Customer webOrder id=7787780 Customer webOrder id=7787780 Customer Name: John Doe Address: S vres Name: John Doe Address: S vres Name: John Doe Address: S vres Order selection: on-going Ref: PC 456 Factory: undecided Parts: not-active orderDate: 2009/07/24 Site: http://d555.com Payment: pending Delivery: not-active Order selection : committed Ref: PC 456 Factory: Milano Parts: done orderDate: 2009/07/24 Site: http:// d555.com Payment: done Bank-account: CEIF-4457889 Delivery: on-going Address: Orsay Order selection : committed Ref: PC 456 Factory: Milano Parts: on-going orderDate: 2009/07/24 Site: http:// d555.com Payment: done Bank-account Delivery: not-active 38

  39. catalogue WEBSTORE PLANT DELIVERY ARCHIVE CREDIT APPROVAL WAREHOUSE 39

  40. Sequencing of operations Different ways of expressing sequencing of tasks Guards: preconditions for function calls Transition-based diagrams Formulas in temporal logic Study how they can simulate each other using some scratch paper 40

  41. A jewel of active documents Casting document to a target type

  42. The casting problem Given An active document I The signature of the functions And a target type T Which functions to call to be sure to reach T? 2-player game Juliet chooses which function to call Romeo chooses a value within the domain of the function Juliet wins if she can reach a document in T 42

  43. An abstraction: active context-free games On words instead of trees Game (?,R,T) ? is a finite alphabet R set of CF rules T is a regular target language w is the start word Output: true if Juliet has a winning strategy Alternation of states (Juliet pick next function to call) and states (the adversary Romeo picks the answer) 43

  44. Examples Winning Losing a abc*; b (ba)*b; c ab Target abab(ab)* Start word aba Strategy Call the second a Call all the c s Obtain a word in Target Start word ab No strategy Initially If I call a or b, #(a) #(b) < 0 #(a) #(b) = 0 44

  45. Fun rewriting game The problem is undecidable in general Interesting decidable subcases MuschollSchwentickSegoufin Juliet has to traverse the string from left to right No recursion among function calls Function call are linear Also in practice, very efficient casting based on unambiguous grammars 45

  46. Conclusion

  47. Some works around Axml The Axml system open-source (on server, on smartphone) The useful: Replication and query optimization How to evaluate a query efficiently by taking advantage of replication The useful: Lazy query evaluation How to evaluate a query without calling all embedded services The fun: Casting problem Which functions to call to match a target type Active context-free games The exotic Diagnosis of communication systems based on datalog optimization Access control Distributed design Probabilistic generation of documents 47

  48. We will come back to distribution Lesson 6: datalog Lesson 7: distributed data management in general Lesson 8: distributed knowledge bases - recursion is essential 48

  49. Acknowledgements With many colleagues, in particular: Tova Milo (Tel Aviv) Luc Segoufin (INRIA) Georg Gottlob (Oxford) Angela Bonifati (Lille) Balder ten Catte (UCSC) And PhD students Omar Benjelloun (Google) Pierre Bourhis (INRIA) Marco Manna (Calabria) Zoe Abrams (Google) Bogdan Cautis (Telecom) And others Victor Vianu (UCSD) Ioana Manolescu (INRIA) Alkis Polyzotis (UCSC) Marie-Christine Rousset (Grenoble) Yannis Katsis (UCSD) Bogdan Marinoiu (SAP) Alban Galland (INRIA) Nicoleta Preda (Versailles) Emmanuel Taropa (Google) Spyros Zoupanos (Max-Planck-Institut) 49

  50. Merci ! 9/30/2024 50

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#