Interoperability for Provenance-aware Databases Using PROV and JSON

Interoperability for
Provenance-aware Databases
using PROV and JSON
 
 
Dieter 
Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy
Oracle Corporation
 
 
 
 
Raghav Kapoor, Boris Glavic
Illinois Institute of Technology
 
 
Venkatesh Radhakrishnan
Facebook
 
 
Xing Niu
Illinois Institute of Technology
xniu7@hawk.iit.edu
Outline
 
Introduction
Related work
Overview
Export and Import
Experimental Results
Conclusions and Future Work
Introduction
 
The PROV standards
A standardized, extensible representation of provenance
graphs
Exchange of provenance information between systems
Provenance-aware DBMS
Computing the provenance of database operations
E.g., Perm[1], GProM [2], DBNotes[3], Orchestra[4],
LogicBlox[5]
 
3
 
[1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In In Search
of Elegance in the Theory and Practice of Computation, pages 291–320. Springer, 2013..
[2] YB. Arab, D. Gawlick, V. Radhakrishnan, H. Guo, and B. Glavic. 
A generic provenance middleware for database queries,
updates, and transactions
. In TaPP, 2014.
[3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. 
An Annotation Management System for Relational Databases.
VLDB Journal, 14(4):373–396, 2005.
[4] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. 
Collaborative data sharing via update exchange and provenance.
TODS, 38(3):19, 2013.
[5] Huang, S., Green, T., Loo, B.: 
Datalog and emerging applications: an interactive tutorial
. In: SIGMOD, pp.
1213–1216 (2011)
Introduction
 
Example
: extracting demographic information
from tweets
 
4
Introduction
 
Problem
:
No relational database system supports tracking of
database provenance as well as 
import
 and 
export
 of
provenance in PROV
Not capable of exporting provenance into standardized
formats
E.g., GProM
:
Essentially produces 
wasDerivedFrom
 edges
Between the output tuples of a query Q and its inputs.
However, not available as PROV graphs
No way to track the derivation back to non-database entities
 
5
Introduction
 
GProM System
 
 
6
 
Computes provenance for database
operations
Queries, updates, transactions
 
Using SQL language extensions
e.g., 
PROVENANCE OF  
(
SELECT
 ...)
Introduction
 
Example of GProM in action
The result of 
PROVENANCE OF 
for query Q
Each tuple in this result represents one 
wasDerivedFrom
assertion
E.g., tuple t
o1
 was derived from tuple t
1
 
7
Introduction
 
Goal
: make databases interoperable with other
provenance systems
Approach
:
Export and import of provenance
PROV-JSON
Propagation of imported provenance
Implemented in GProM using SQL
 
8
Outline
 
Introduction
Related work
Overview
Export and Import
Experimental Results
Conclusion and future work
Related Work
 
How to integrate provenance graphs by identifying common
elements? [6]
Address interoperability problem between databases and other
provenance-aware systems through
Common model for both types of provenance [7][8][9]
Monitoring database access to link database provenance with other
provenance systems [10][11]
 
10
 
[6] A. Gehani and D. Tariq. Provenance integration. In TaPP, 2014.
[7] U. Acar, P. Buneman, J. Cheney, J. van den Bussche, N. Kwasnikowska, and S. Vansummeren
. A graph model of data and workflow
provenance
. In TaPP, 2010.
[8] Y. Amsterdamer, S. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. 
Putting Lipstick on Pig: Enabling Database-style
Workflow Provenance
. PVLDB, 5(4):346–357, 2011.
[9] D. Deutch, Y. Moskovitch, and V. Tannen. 
A provenance framework for data-dependent process analysis
. PVLDB, 7(6), 2014.
[10] F. Chirigati and J. Freire. 
Towards integrating workflow and database provenance
. In IPAW, pages 11–23, 2012.
[11] Q. Pham, T. Malik, B. Glavic, and I. Foster. 
LDV: Light-weight Database Virtualization
. In ICDE, pages 1179–1190, 2015.
Outline
 
Introduction
Related works
Overview
Export and Import
Experimental Results
Conclusion and future work
Overview
 
We introduce techniques for exporting database provenance
as PROV documents
Importing PROV graphs alongside data
Linking outputs of SQL operations to imported provenance
for their inputs
Implementation in GProM offloads generation of PROV documents
to backend database
SQL and string concatenation
 
12
Outline
 
Introduction
Related works
Overview
Export and Import
Experimental Results
Conclusion and future work
Export and Import
 
 
Export
Added 
TRANSLATE AS 
clause
e.g., 
PROVENANCE OF 
(
SELECT
 ...) 
TRANSLATE
AS
Construct PROV-JSON document from database
provenance
Running several projections over the provenance
computation
E.g., ‘
”_:wgb\(
’ || F0.STATE || ‘
|
  
|| F0.”AVG(AGE)” || ‘
\)
’…
Uses aggregation to concatenate all snippets of a certain
type
E.g., entity nodes, 
wasGeneratedBy
 edges, 
allUsed
 edges
Uses string concatenation to create final document
 
 
 
 
 
14
Export and Import
 
Example: part of the final PROV document
 
 
 
 
 
 
15
 
Red dotted lines in DB
Export and Import
 
Import
Import PROV for an existing relation
Provide a language construct 
IMPORT PROV FOR 
...
Import available PROV graphs for imported tuples and
store them alongside the data
Add 
three columns 
to each table to store imported
provenance
prov doc: 
store a PROV-JSON snippet representing its
provenance
Prov_eid: 
indicates which of the entities in this snippet
represents the imported tuple
Prov_time: 
stores a timestamp as of the time when the tuple was
imported
 
 
 
 
 
16
Export and Import
 
Import
example
Relation user with imported provenance
Attribute value 
d
 is the previous PROV graph without
database activities and entities
 
 
 
 
 
 
17
Export and Import
 
Using Imported Provenance During Export
Include the imported provenance as bundles in the
generated PROV graph
Bundles
 [13] enable nesting of PROV graphs within
PROV graphs, treating a nested graph as a new entity.
Connect the entities representing input tuples in the
imported provenance to the query activity and output
tuple entities
 
18
 
[13] P. Missier, K. Belhajjame, and J. Cheney. 
The W3C PROV family of specifications for modelling
provenance metadata
. In EDBT, pages 773–776, 2013.
Export and Import
 
Example of Bundles:
 
 
19
Export and Import
 
Handling Updates
If a tuple is modified, that should be reflected when
provenance is exported
E.g., by running an SQL 
UPDATE
 statement
Example
Assume the user has run an update to correct tuple t
1
’s age value
(setting age to 70) before running the query
 
20
Export and Import
 
Challenge
 How to track the provenance of updates under
transactional semantics
Solution
GProM using the novel concept of reenactment
queries
User can request the provenance of an 
past update
,
transaction
, or 
set of updates 
executed within a given
time interval
Construct PROV document using provenance
for updates computed on-the-fly
 
21
Outline
 
Introduction
Related works
Overview
Export and Import
Experimental Results
Conclusion and future work
Experimental Results
 
TPC-H [14] benchmark datasets
Scale factor from 0.01 to 10 (10MB up to 10GB size)
Run on a machine with
2 x AMD Opteron 3.3Ghz Processors
128GB RAM
4 x 1 TB 7.2K RPM disks configured in RAID 5
Queries
Provenance of a three way join between relations 
customer
,
order
, and 
nation
With additional selection conditions to control selectivity (and,
thus, the size of the exported PROV-JSON document).
 
23
 
[14] TPC. 
TPC-H Benchmark Specification
, 2009.
Experimental Results
 
24
 
1 GB
 
10 GB
Outline
 
Introduction
Related works
Overview
Export and Import
Experimental Results
Conclusions and Future Work
Conclusions and Future Work
 
Conclusions
Integrated import and export of provenance represented as
PROV-JSON into/from provenance-aware databases
Construct PROV graphs on-the-fly using SQL
Connect database provenance to imported PROV data
Future Work
Full implementation for updates
Automatic storage management (e.g., deduplication) for
imported provenance
Automatic cross-referencing
 
26
Questions
 
My Webpage
http://www.cs.iit.edu/~dbgroup/people/xniu.php
Our Group’s Webpage
http://cs.iit.edu/~dbgroup/research/index.html
GProM
http://www.cs.iit.edu/~dbgroup/research/gprom.ph
p
 
 
27
Others
 
Provenance querying
Provenance for JSON
 
28
Slide Note
Embed
Share

This research paper discusses the challenges in tracking database provenance and proposes a system, GProM, that computes provenance for database operations. It highlights the importance of exchanging provenance information between systems and the limitations of current relational database systems in exporting provenance in standardized formats like PROV. The paper presents examples and experimental results, emphasizing the need for interoperability in provenance-aware databases.

  • Provenance
  • Database
  • Interoperability
  • PROV
  • JSON

Uploaded on Sep 29, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Interoperability for Provenance-aware Databases using PROV and JSON Xing Niu Illinois Institute of Technology xniu7@hawk.iit.edu Raghav Kapoor, Boris Glavic Illinois Institute of Technology Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Venkatesh Radhakrishnan Facebook

  2. Outline Introduction Related work Overview Export and Import Experimental Results Conclusions and Future Work

  3. Introduction The PROV standards A standardized, extensible representation of provenance graphs Exchange of provenance information between systems Provenance-aware DBMS Computing the provenance of database operations E.g., Perm[1], GProM [2], DBNotes[3], Orchestra[4], LogicBlox[5] [1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In In Search of Elegance in the Theory and Practice of Computation, pages 291 320. Springer, 2013.. [2] YB. Arab, D. Gawlick, V. Radhakrishnan, H. Guo, and B. Glavic. A generic provenance middleware for database queries, updates, and transactions. In TaPP, 2014. [3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB Journal, 14(4):373 396, 2005. [4] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS, 38(3):19, 2013. [5] Huang, S., Green, T., Loo, B.: Datalog and emerging applications: an interactive tutorial. In: SIGMOD, pp. 1213 1216 (2011) 3

  4. Introduction Example: extracting demographic information from tweets 4

  5. Introduction Problem: No relational database system supports tracking of database provenance as well as import and export of provenance in PROV Not capable of exporting provenance into standardized formats E.g., GProM: Essentially produces wasDerivedFrom edges Between the output tuples of a query Q and its inputs. However, not available as PROV graphs No way to track the derivation back to non-database entities 5

  6. Introduction GProM System Computes provenance for database operations Queries, updates, transactions Using SQL language extensions e.g., PROVENANCE OF (SELECT ...) 6

  7. Introduction Example of GProM in action The result of PROVENANCE OF for query Q Each tuple in this result represents one wasDerivedFrom assertion E.g., tuple to1 was derived from tuple t1 7

  8. Introduction Goal: make databases interoperable with other provenance systems Approach: Export and import of provenance PROV-JSON Propagation of imported provenance Implemented in GProM using SQL 8

  9. Outline Introduction Related work Overview Export and Import Experimental Results Conclusion and future work

  10. Related Work How to integrate provenance graphs by identifying common elements? [6] Address interoperability problem between databases and other provenance-aware systems through Common model for both types of provenance [7][8][9] Monitoring database access to link database provenance with other provenance systems [10][11] [6] A. Gehani and D. Tariq. Provenance integration. In TaPP, 2014. [7] U. Acar, P. Buneman, J. Cheney, J. van den Bussche, N. Kwasnikowska, and S. Vansummeren. A graph model of data and workflow provenance. In TaPP, 2010. [8] Y. Amsterdamer, S. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. Putting Lipstick on Pig: Enabling Database-style Workflow Provenance. PVLDB, 5(4):346 357, 2011. [9] D. Deutch, Y. Moskovitch, and V. Tannen. A provenance framework for data-dependent process analysis. PVLDB, 7(6), 2014. [10] F. Chirigati and J. Freire. Towards integrating workflow and database provenance. In IPAW, pages 11 23, 2012. [11] Q. Pham, T. Malik, B. Glavic, and I. Foster. LDV: Light-weight Database Virtualization. In ICDE, pages 1179 1190, 2015. 10

  11. Outline Introduction Related works Overview Export and Import Experimental Results Conclusion and future work

  12. Overview We introduce techniques for exporting database provenance as PROV documents Importing PROV graphs alongside data Linking outputs of SQL operations to imported provenance for their inputs Implementation in GProM offloads generation of PROV documents to backend database SQL and string concatenation 12

  13. Outline Introduction Related works Overview Export and Import Experimental Results Conclusion and future work

  14. Export and Import Export Added TRANSLATE AS clause e.g., PROVENANCE OF (SELECT ...) TRANSLATE AS Construct PROV-JSON document from database provenance Running several projections over the provenance computation E.g., _:wgb\( || F0.STATE || | || F0. AVG(AGE) || \) Uses aggregation to concatenate all snippets of a certain type E.g., entity nodes, wasGeneratedBy edges, allUsed edges Uses string concatenation to create final document 14

  15. Export and Import Example: part of the final PROV document Red dotted lines in DB 15

  16. Export and Import Import Import PROV for an existing relation Provide a language construct IMPORT PROV FOR ... Import available PROV graphs for imported tuples and store them alongside the data Add three columns to each table to store imported provenance prov doc: store a PROV-JSON snippet representing its provenance Prov_eid: indicates which of the entities in this snippet represents the imported tuple Prov_time: stores a timestamp as of the time when the tuple was imported 16

  17. Export and Import Import Relation user with imported provenance Attribute value d is the previous PROV graph without database activities and entities example 17

  18. Export and Import Using Imported Provenance During Export Include the imported provenance as bundles in the generated PROV graph Bundles [13] enable nesting of PROV graphs within PROV graphs, treating a nested graph as a new entity. Connect the entities representing input tuples in the imported provenance to the query activity and output tuple entities [13] P. Missier, K. Belhajjame, and J. Cheney. The W3C PROV family of specifications for modelling provenance metadata. In EDBT, pages 773 776, 2013. 18

  19. Export and Import Example of Bundles: 19

  20. Export and Import Handling Updates If a tuple is modified, that should be reflected when provenance is exported E.g., by running an SQL UPDATE statement Example Assume the user has run an update to correct tuple t1 s age value (setting age to 70) before running the query 20

  21. Export and Import Challenge How to track the provenance of updates under transactional semantics Solution GProM using the novel concept of reenactment queries User can request the provenance of an past update, transaction, or set of updates executed within a given time interval Construct PROV document using provenance for updates computed on-the-fly 21

  22. Outline Introduction Related works Overview Export and Import Experimental Results Conclusion and future work

  23. Experimental Results TPC-H [14] benchmark datasets Scale factor from 0.01 to 10 (10MB up to 10GB size) Run on a machine with 2 x AMD Opteron 3.3Ghz Processors 128GB RAM 4 x 1 TB 7.2K RPM disks configured in RAID 5 Queries Provenance of a three way join between relations customer, order, and nation With additional selection conditions to control selectivity (and, thus, the size of the exported PROV-JSON document). [14] TPC. TPC-H Benchmark Specification, 2009. 23

  24. Experimental Results 1 GB 10 GB 24

  25. Outline Introduction Related works Overview Export and Import Experimental Results Conclusions and Future Work

  26. Conclusions and Future Work Conclusions Integrated import and export of provenance represented as PROV-JSON into/from provenance-aware databases Construct PROV graphs on-the-fly using SQL Connect database provenance to imported PROV data Future Work Full implementation for updates Automatic storage management (e.g., deduplication) for imported provenance Automatic cross-referencing 26

  27. Questions My Webpage http://www.cs.iit.edu/~dbgroup/people/xniu.php Our Group s Webpage http://cs.iit.edu/~dbgroup/research/index.html GProM http://www.cs.iit.edu/~dbgroup/research/gprom.ph p 27

  28. Others Provenance querying Provenance for JSON 28

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#