Evolution of GoogleSQL: A Comprehensive Overview
Delve into the history, development, and goals of GoogleSQL as a fundamental SQL language component at Google. Explore the challenges faced with SQL dialects, the establishment of a canonical SQL language, and the evolution of GoogleSQL into various cloud products and internal systems. Understand the foundational principles and scope of GoogleSQL's language, syntax, semantics, and structured data types.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
GoogleSQL: A SQL Language as a Component September 2022 David Wilhite Jeff Shute GoogleSQL: SQL as a Component GoogleSQL: SQL as a Component
Outline History of GoogleSQL GoogleSQL Language GoogleSQL Libraries Lessons Learned Resources GoogleSQL: SQL as a Component
GoogleSQL History SQL Dialects at Google (circa 2013) "SQL" Query Engines Dremel BigQuery F1 Spanner SQL several more All different Different syntax Different semantics Different type systems Inconsistent handling of protocol buffers => This was confusing and bad for users. GoogleSQL: SQL as a Component
GoogleSQL History GoogleSQL Goals One canonical SQL language at Google: To be used internally by multiple query engines To be released externally in cloud products Common syntax and semantics Common type system Common parsing and analysis Feels like standard ANSI SQL With extensions for complex types like protocol buffers Backwards compatibility was a non-goal User migration required GoogleSQL: SQL as a Component
GoogleSQL History GoogleSQL in Google Today (September 2022) F1 Query Spanner Dremel GoogleSQL preferred BigQuery GoogleSQL (Standard SQL) mode Procella (YouTube) Built from the start with GoogleSQL Dataflow SQL (built on Beam / ZetaSQL) Many others, both cloud products and internal products ZetaSQL - Open source version of GoogleSQL GoogleSQL: SQL as a Component
GoogleSQL Language GoogleSQL Language Scope GoogleSQL defines Statement syntax Language semantics Type system Data model Built-in Functions GoogleSQL does not define Engine implementation details Storage features and semantics - indexes, materializations, ... Client APIs GoogleSQL: SQL as a Component
GoogleSQL Language Language Principals Essentially ANSI SQL, with extensions for structured data types and more Our structured data types are similar to (less well known) types that exist in SQL-1999+, with improved querying. Principles: GoogleSQL conforms to ANSI SQL With extensions and omissions Some problems with SQL Standard: Doesn't specify many de facto industry standard functions Lots of custom, non-standard function call syntax Some "Old-fashioned" types and syntax GoogleSQL: SQL as a Component
GoogleSQL Language Highlight: Complex Types Complex Types, with nested structures STRUCT, ARRAY, protocol buffers, JSON Many language features to simplify usage of compound types 'Unnesting' or 'Flattening' arrays within a table Array operators (filtering, aggregating, transforming, etc.) Structure-preserving transformations on nested data More SELECT ARRAY_FILTER([1,3,2,4], e -> e > 2); [3,4] SELECT FILTER_FIELDS(<proto-typed-expression>, +a, -a.b, +c) Preserve proto fields 'a' and 'c' but exclude sub-field 'a.b' GoogleSQL: SQL as a Component
GoogleSQL Libraries GoogleSQL Libraries - Architecture Catalog Reference Implementation SQL Function Library Engine- specific Functions Resolver Resolved AST BigQuery SQL Parser AST Spanner Other Engines Input GoogleSQL Query Engine GoogleSQL: SQL as a Component
GoogleSQL Libraries GoogleSQL Analysis Parser Produces Parser AST (Abstract Syntax Tree) Use is discouraged, especially for query engines Simple syntactic rewriters sometimes make sense Resolver All semantic validation and decisions made All names are scoped and resolved via the Catalog Interface All types and function signatures determined Type coercions, implicit casts added, etc. Expanded SELECT *, USING, etc Produces Resolved AST all syntax and semantic errors have already been detected GoogleSQL: SQL as a Component
GoogleSQL Libraries Reference Implementation Reference Implementation Simple in-memory implementation with correct behavior GoogleSQL Evaluator Library Based on reference implementation Evaluate queries Evaluate standalone expressions Scalar expressions, such as in WHERE or SELECT E.g., boolean expressions to support filtering GoogleSQL: SQL as a Component
GoogleSQL Libraries Compliance Testing Critical for compatibility and interoperability! Compliance Test Suite Test data, test queries, and expected output Validates all specified behavior Compliance Framework Engines write a test driver Runs compliance tests against the engine Random Query Generator Query results compared between the engine and the reference implementation GoogleSQL: SQL as a Component
GoogleSQL Libraries Ecosystem and Tools beyond core GoogleSQL Integration with (internal) web-based code search and IDEs Syntax highlighting, auto-complete, documentation hovercards, etc. Jump to definitions & uses (code search for SQL) GoogleSQL unit testing framework SQL Formatter (recently available in ZetaSQL) GoogleSQL: SQL as a Component
Lessons Learned Lessons Learned To get consistent behaviors across query engines: Just defining the language and semantics is not enough Sharing only a parser is not enough Testing infrastructure is critical Proper abstractions and well-defined, stable APIs are required Simply unioning of dialects doesn't work Many projects need a query or expression language GoogleSQL: SQL as a Component
Resources Resources ZetaSQL Codebase: github.com/google/zetasql Papers related to GoogleSQL: How We're Helping Developers with Differential Privacy (blogpost 2021) Big Metadata: When Metadata is Big Data (VLDB 2021) Dremel: A Decade of Interactive SQL Analysis at Web Scale (VLDB 2020) F1 DB: Declarative Querying at Scale (VLDB 2018) Spanner: Becoming a SQL System (SIGMOD 2017) Contact wilhite@google.com jshute@google.com GoogleSQL: SQL as a Component
Thank You! Questions? GoogleSQL: SQL as a Component