Exploring Binary Representation and Semantic Approaches in Data and Programs
Delve into the fascinating realm of binary representation and semantic approaches in data and programs through a series of discussions and examples. Topics include encoding/decoding relevance, tamper-proof bytecode formats, incorporating semantic info for compact encodings, and playing guess-who games in functional programming research. Discover innovative strategies for making codecs and exploring the connection between questions and answers in a guessing-game approach to coding.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Every Bit Counts A semantic approach to the binary representation of data and programs Andrew Kennedy & Dimitrios Vytiniotis Microsoft Research Cambridge {akenn,dimitris}@microsoft.com
Before the fun starts: is encoding and decoding relevant? Sure: How to design easy-to-verify tamper-proof bytecode formats? Semi-formal work for Java [Franz et al.] How to incorporate semantic and statistical info for more compact encodings and compression? Java bytecode and .NET CLI are quite bulky formats, also work on Javascript compression schemes etc. Lots of work in the XML world, oracle-based PCC checking [Necula and Rahul], term compression [J. Cheney], How to make it easy to prove the correctness of a codec? Lots of work in the generic programming realm, also PADS [Fisher et al.] and offer all that in a nice DSL? Easier in use and verification than pickler combinators [Kennedy]
Lets play a guess who game I have some PL researcher in mind. Can you guess who? Do they do research in functional programming? Yes Do they care a lot about minimal invariance? No Do they work on polytypic programming? Yes Are they taller than 1.90m? Yes Are they in the ICFP committee? No
Guess which program I have some program* in mind. Can you guess which? Question Answer Is it a function application? No (It must be a -expression.) Is the argument an Int? Yes Is the body a variable? No Is the body a function application? No (It must be another -expression.) Is its argument an Int? Yes Is its body a variable? Yes Code 0100110 Is it the most recently introduced variable? No Aha! You thought of x:Int. y:Int.x * A closed program in STLC with Int base type.
The idea Represent a codec by a strategy for playing a question & answer guessing-game Encode ask questions of data and record answers as bitstream Decode interpret bitstream as answers to the same Q&A strategy
Example play, set-theoretically Is it a function application? No. Is its argument an Int? Yes. Is its body a variable? No. Non-Int- argument lambdas All well-typed programs Lambda expressions Int-argument lambdas x:Int. y:Int.x Int-argument lambdas with variable body Singleton set Int-argument lambdas with non-variable body Function application expressions Set of possible data values Binary partition of set
From sets to types Possible set of data values: type ? Binary partition of set: type isomorphism ? ?1+ ?2 Singleton set: type isomorphism ? 1 Strategy: possibly-infinite binary decision tree whose nodes contain type isomorphisms ? ?1+ ?2 leaves contain type isomorphisms ? 1 ? 1 ???? ???? ? ??? ?????? ??? ? ???? ? ????? ??? ? ??? ?? ?1?2 ???? ?1 ???? ?2 ???? ? ? ?1+ ?2 ???? ??? ? ? = ??? ? ? (? ? )
A silly game: unary naturals ?????? ??? ??? ??? ?? () ??? ?????? = ??? ??? ??? ? ??? ??? 0 ??? ? + 1 = ??? ? ? ??? ???? () = 0 ??? ??? ? ? = ? + 1 ???????? ???? () ???????? = ?????? (??? ?? ??) ????????? ???? ??? ????????? = ????? ?????? ???????? ????????? isZero: 1 + 1 = ???? () 0 isZero: 1 + 1 0 Infinite tree, crucially relying on laziness (co-induction in Coq)
Generic encoding and decoding Encode a value of type ? to a bitstream If ? is a singleton, there s no information to encode! ??? ???? ? ? ??? ??? (?????? ?) ? = [] ??? (????? ??? ??? ???) ?1 ?2 ? = ???? ??? ? ?? ???? ?1 1 ??? ?1 ?1 ??? ? ?2 0 ??? ?2 ?2 Otherwise, use the map from ? to ?1+ ?2 to ask in which partition ? lives Emit a bit and continue on the left or right subtree with the deconstructed value ??? ???? ? ??? ?, ??? May throw error if bistream too short Example: ??? ????????? 5 = [0,0,0,0,0,1] ??? ????????? 0,0,0,0,0,1,0,1 = (5, 0,1 )
Correctness for free* Set ? ? ???? ? ? 01001010 * If the ISOs are indeed isomorphisms Bitstrings
Non-ambiguity and non-redundancy for free* Set ? ? ???? ? ? Non-redundant codes ? Unambiguous codes 01001010 01001110 * If the ISOs are indeed isomorphisms Bitstrings
Game combinators Cast a game from one type to another through an isomorphism Given games for ? and ?, construct games for sum or product of ? and ? Dependent pairs: type of second component depends on value of first +> ??????? ???? ? ???? ? ???? ??? ?? ? ? ???????? ???? ? ???? ? ???? ?,? ??????? ???? ? ? ?.???? ? ? ??????? ???? ? ?:???.???? (???? ?) ???? ? ??? ? ? ???? ? ???? ? ?. ? ? ????? ???????? ???? ? ???? ? ???.???? ? ????? ???????? ? = ??????? ????????? ??????? ? ???????? ???? ? ???? ? ???????? ? = ????? ???????? ? +> ??? ? ??? ??? ??? ? ??? = ? ???.?????
Combinators = co-fixpoints ???????? ???? ?1 ???? ?2 ???? (?1,?2) ?1 ???? ?1 ?2 ???? ?2 ???????? ?1 ?2 ?2 ?2 ?2
No silly questions please, and Every Bit Counts! If possible, strategy should not ask silly questions that reveal no new information e.g. Are you a number smaller than 5? Yes Are you a number smaller than 7? Of course I am! This corresponds to proper partitioning: For all isos ? ?1+ ?2in game ?, sets ?1 and ?2 are non-empty Theorem: Suppose ? has proper partitioning, and there is a leaf for every element of its domain. If ??? ? ? fails then there is some extension ? of ? such that ??? ? ? succeeds.
But what does that mean? Theorem: blablablah That feels highly compact! Can we take this domain to be the set of well typed programs ? EVERY bitstring represents a non-empty set of elements in the domain
Simple types ? = ??? | ? ? ,?:?1 ? ?2 ??:?1.? ?1 ?2 ?1:?1 ?2 ?2:?1 ?1 ?2 ?2 ,?:?, ?:? Problem: Devise a game for STLC with no silly questions ! Idea 1:Parameterize game on environment (for open terms) and type: ??????? ? ??? ? ???? ???? ? ??? ?????? ? ? = ? } Not every environment/type combination is inhabited. To avoid asking silly questions (at game construction time not at encoding/decoding time) we have to solve inhabitation problems.
Some ingenuity required Idea 2: Parameterize on environment and pattern of form ?1 ?? ? where ? is a wildcard ??????? ? ??? ? ??????? ???? ? ??? ???? ?? ?????? ? ? ? } All environment/pattern combinations are inhabited, no need to solve hard problems at game construction time A provably EVERY BIT COUNTS encoding for STLC (and the proof did not kill us)
The STLC game Can we play a game for variables with this pattern in this environment? Are you a variable? Are you an application? Application game: 1. Play game for argument 2. *Get* the argument and play game for the function using the argument s type
Pearly too! Haskell code for STLC, on one slide See paper for details, games for several statistical compression schemes, and even more game transformations
Future directions Do it for real! E.g. .NET CIL, ghc Core Integrate arithmetic coding. Put probabilities on arcs of tree. Develop methodology for codecs for typed programming languages. (=> No ingenuity required?)
Whats left, after the fun? An elegant characterization codecs Q&A strategies and a DSL to program them 1. Q&A strategies can give rise to non-redundant, compact coding schemes 2. Offer cheap verification 3. And are fun to program with Download and play: http://research.microsoft.com/people/dimitris http://research.microsoft.com/people/dimitris