Automatic Incremental View Maintenance in DBSP: A Comprehensive Overview

Slide Note
Embed
Share

Analyzing the automatic incremental view maintenance for rich query languages like DBSP. Discussing concepts such as incremental computation reuse, streaming language, relational computations, streaming operators, and more. Explore the conversion of arbitrary DBSP programs to incremental ones and the implications for SQL and Datalog. Delve into stream operators, chaining circuits, delays, and computing changes in DBSP.


Uploaded on Apr 19, 2024 | 4 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. DBSP: DBSP: Automatic Incremental View Maintenance for Rich Query Languages Mihai Budiu, Leonid Ryzhyk feldera.com* Tej Chajed University of Wisconsin-Madison* Frank McSherry materialize.com Val Tannen University of Pennsylvania * = work done at VMware Research Best paper award

  2. Incremental Computation Reuse outputs when inputs change ? ? ? reuse ? + ? ? ? + ? Perform work proportional to the size of the change 2

  3. Abstract A simple streaming language called DBSP Four (4) operators A definition of incremental computation in DBSP An algorithm converting an arbitrary DBSP program to an incremental DBSP program A recipe implementing DB queries in DBSP 3

  4. DBSP can express Relational (select, project, join, union, difference) Computations on sets, multisets Nested relations (group-by, unnest/flatmap) Aggregation Recursion (monotonic and non-monotonic) (graph queries) Stratified negation Streaming computations (joins, window aggregates) And more (Turing-complete) Corollary: we get automatically incremental evaluation for all these SQL Datalog 4

  5. Streams Infinite vectors time 0 1 2 1 0 -1 -2 -1 ??= streams with elements of type ? Assume that ? has +, , and 0 (zero) (? is a commutative group) 5

  6. Stream operators Functions from streams to streams: ?:?? ?? ?0 ?1 ?2 ? ?0 ?1 ?2 Can have multiple inputs: ? 6

  7. Lifting Convert a function ?:? ? into a stream operator ?:?? ?? ? ? ? ? ?(?) ?(?) ?(?) 7

  8. Chaining operators into circuits Dataflow graphs (function composition) o ? ?1 ? ? ? ?2 8

  9. Delay (?1 ) Output is input stream delayed by one step First value is 0 ? 1 0 ? ? ? ? ? ? ? 9

  10. Computing changes (deltas) ? o o is the stream of changes of s 1 1 1 1 ? 1 2 3 2 ? ? = ? ? ? ? 1 stream of deltas 10

  11. Integration ? o If s is a stream of changes then o is the original stream ?[0] = ?[0] ?[?] = ?[?] + ?[? 1] ? ? = ? 0 + ? 1 + + ?[?] 1 2 2 3 ? 1 1 1 1 stream of changes 11

  12. ? and ? cancel out 12

  13. All databases are streaming databases! Consider a database ??, a set of tables A committed transaction is a change to ?? The set of linearized transactions define ?,a stream of changes to ?? ?[?] is the ?-th transaction ?? is a stream of database snapshots ??[?] is the contents of the database after ? transactions have been commited ??[?] = ?(?)[?] A database (stream) is the integral of a transaction stream ??0 ??1 ??2 ??3 ?0 ?1 ?2 ?3 ? 13

  14. Views are lifted queries Let ? be a query defining a view ?, e.g.: CREATE VIEW V AS SELECT * FROM T WHERE Age >= 10 ?[?] = ? ??[?] ? is a stream of view snapshots: ? = ?(??) ? ?0 ?1 ?2 ?3 Versions of ? Versions of ?? ??0 ??1 ??2 ??3 14

  15. Incrementally maintaining view ? This is our definition of IVM Outputs can be small (they are deltas) It is compositional: inputs and outputs are both changes ( ?) ?0 ?1 ?2 ?3 ? ? ?0 ?1 ?2 ?3 D Changes to ? Changes to ?? ?0 ?1 ?2 ?3 Versions of ? Versions of ?? ??0 ??1 ??2 ??3 15

  16. The chain rule ?1 (?2 ?1) = ?2 Proof by pictures: We can incrementalize a complex query by incrementalizing each component independently 16

  17. But databases are not commutative groups Enter -sets Each row has an integer weight positive, zero, or negative Can represent both DB tables and changes to tables Positive weight => row added Negative weight => row removed Generalize sets and multisets Classic DB table = -set where all weights are 1 Form a commutative group Name Age Weight Mike 10 1 John 12 3 Amy 8 -1 Chris 10 2 tuples 17

  18. We can use -sets to Implement all of SQL Efficient incremental versions of each operator 18

  19. Does it work? Theory was verified using Lean theorem prover Open-source implementations: DBSP runtime in Rust (all DB operators and their incremental versions): https://crates.io/crates/dbsp Full SQL compiler to DBSP (based on Apache Calcite): https://github.com/feldera/feldera/tree/main/sql-to-dbsp-compiler We founded inc. to develop this software stack 19

  20. 20

  21. Extra slides 21

  22. Incremental queries are streaming systems ? ?0 ?1 ?2 ?3 ?0 ?1 ?2 ?3 state ? is a streaming system: it runs forever it maintains internal state (even if ? is stateless) the state is stored in the delay operators (inside ? and ?) 22

  23. Linear operators: ?(? + ?) = ?(?) + ?(?) For a linear operator ? we have ? = ? Why is this better? ?0 ?1 ?2 ?3 ? ? ?0 ?1 ?2 ?3 D ??0 ??1 ??2 ??3 Optimized: ?0 ?1 ?2 ?3 ?0 ?1 ?2 ?3 ? A speedup of ?( ?? /|?|) 23

  24. Bilinear operators (Lifted) join From distributivity: ? ? = ? ? + ? ? + ? ? ? ?+ ? ?) Speed-up is ?( 24

Related


More Related Content