Deep Learning for Math Knowledge Processing: Goals and Related Work

Slide Note
Embed
Share

This project aims to leverage deep learning for math-entity representation learning, math semantics extraction, and application development in the fields of mathematics and natural language processing. The long-term objectives include semantic enrichment of math expressions, conversion of math to computation, and enhanced math search capabilities. The project also explores related work in math OCR, formula recognition, and machine learning tasks in mathematics and computational linguistics.


Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Deep Learning for Math Knowledge Processing Abdou Youssef and Bruce Miller 1

  2. Background and Drivers (1) -- Deep Learning Capabilities-- Deep Learning s (DL) Unprecedented Breakthroughs in NLP Modeling, representation, and semantics Machine translation Speech recognition (and many other applications) Technical Advances of DL (capitalizing on large datasets) Learning to represent better, capturing relational semantics Learning to do sequence-to-sequence mapping Learning to forget, to remember, and to focus attention 2

  3. Background and Drivers (2) -- Math Needs-- Math needs similar capabilities: Representation and semantics Translation (with high accuracy) Presentation to Computation (P2C) Latex/pMLL cMML/Programs Informal Formal Visual to Digital (V2D) PDF scan Hand-written MML/Latex MML/Latex Software End-to-end apps Basic building blocks for people to synthesize new apps 3

  4. Goals of this Project -- short Term -- Apply Deep Learning to: Math-entity representation learning (Math2Vec) Math semantics extraction and disambiguation Create algorithms and public-domain software For the above two tasks Trained models Create publicly available datasets (and APIs) Labeled data Math2vec representations 4

  5. Goals of this Project -- Long Term -- Applications Semantic enrichment/annotation of math expressions P2C conversion of math expressions Math QA capabilities per manuscript/collection Enhanced math search, UI, authoring aid Etc. 5

  6. Related Work Math OCR and formula recognition Math conversion TeX/LaTeX pMML Math search Math semantification/annotation Machine learning: doc classification, topic modeling Math computational linguistics General NLP/computational linguistics Some use of DL already Can we these same tasks better with Deep Learning? XML/HTML/pMML cMML 6

  7. Our Related Ongoing Projects LaTeXML Parses TeX/LaTeX and converts it to pMML and XML Wherever possible, tags math tokens with roles/meanings Part-of-Math Tagger Tokenizes TeX/LaTeX with some parsing Assigns to the math token many tags Some definite and some tentative (from common uses) In later stages, it disambiguates between tentative tags This DL project aims to provide/disambiguate semantics 7

  8. Token type Numbers Explanations and Examples Part of Math numeric quantity; index; reference function; variable; argument; Letters, Alphabetic Strings Operators, Roman, Greek, Hebrew parameter; index; identifier Unary operations and operators: , +, , , !,, , ~, diff. (d, , ), integration (e.g., , , , , ), transforms, lim, inf, Binary operations: , +, /, , , , , , , , Multi-ary operations: , , , , , , , , , Equalities/defs, approx, sim, equiv, congruence: =, , , , , , , , Inequalities: <, >, , , , , Set-theoretic relations: , , , , , , , Logic relations: =>, Turnstile relations: , , , , , , , , , Triangle-shaped inequalities: , , , , , , Geometry/linear-algebra relations: , , , , , Negated binary relations: , , , , , , , Miscellaneous: divides ( ), prop ( ), etc. Operator; operations of various arities Operations Relation (of various kinds as indicated in the left box in bold) Relations 8

  9. Token type Fence symbols Explanations and Examples Delimiters (grouping symbols): ( ) [ ] { } | || etc. Constructors: for creating/denoting sets, vectors, intervals Distributed multi-glyph (DMG) operators: |.|, ||.||, inner product <.,.> , ket | .> , etc. Quantifiers: , , , !, , , etc. Part of Math left-delimiter; right- delimiter; constructor; distributed multi-glyph operators quantifier; Logic tokens proof token The designations in the left box (in bold) Proof tokens: (therefore) , (because) , ( contradiction) , , ; , . , : , | , \ , / . They can be simple, punctuations, separators between elements/args, implied conjunctions and conditionals, or glyphs in DMG operators Diactritics: Overlines, underlines, hats, checks (i.e., upside- down hats), tildes, single/multiple dots, rings, acute/grave/breve accents, arrows/harpoons (e.g., for vectors), one or more prime (e.g., as postfix unary operators for differentiation) Grouping accents: horizontal { }[ ]( ) etc., used under and over sub-expressions Extensible accents: adding symbols above/below the accents for further semantification of accents. Punctuations x+ mass y accent (of various types) Math accents 9

  10. Token type Explanations and Examples standard sets (e.g., , , , ); infinities(e.g., , , , ); empty set ( ); ? and ? (for ); various standard functions (e.g., sin, cos, sinh, log, exp, etc.); and math constants such as (3.1415 ), (Eulerian gamma), (the Golden ratio), etc. They come in various orientations, directions, valences (single/double/triple), head and tail shapes, line type and shape. Used in logic, geometry, function mapping, category theory, etc. Harpoons: , , , , , etc. in geometry, vector analysis Smiles and frowns: , , , etc. in topology and geometry Spoons: as multimap , , as image of , and as original of , etc. Used in function mapping Pitchforks (e.g., ) used in manifolds Angles: , , , , etc. in geometry Triple dots (e.g., , , , ) to designate missing terms in finite or infinite sequences, vectors, matrices, etc . (top), (bottom), (hslash), , , , etc. Part of Math The designations in the left box(in bold) 1 Literals, Constants arrow of various types (later scansrefine this tag) The designations in the left box (in bold) Arrows 1 Various shapes ellipsis with its orientation symbol Ellipses Other 10

  11. Features (of Tokens and Phrases) Feature Name Explanations / Values Category/Role The grammatical role or part-of-math: operation, operator, relation, function, variable, parameter, constant, quantifier, separator, punctuation, abbreviation/acronym, delimiter, left- delimiter, right-delimiter, constructor, accent, etc. Subcategory Further specializes the category: subscript, superscript, numerator, denominator, lower-limit (of an integral), constraint/condition, definition, etc. For an accent, indicates the accent position. Meaning Examples: scalar addition, the cosine function, etc. Signature Data types Font The font characteristics: Typeface, Font-style, and Font-weight Notational Status specifies whether the notation is Generic, Standard (i.e., meant as commonly understood), or Defined (in the manuscript). 11

  12. Math Ambiguities Ambiguities Explanations a power, an index, the order of diff, a postfix unary operator, etc. Superscript Juxtaposition multiplication, function application, or concatenation an applied operator, or a morphological part of the name. y : Derivative of y? Complement of y? A distinct variable Accent Different roles mean distinct parse trees and different semantics. Ex: | and || can be punctuation, operators, relations, or delimiters. Part of math Typically occurring when delimiters are omitted. E.g. sin 2 x + 5 : sin(2 x) + 5 more probable than sin(2 x + 5) Scope Necessary to completely resolve semantics; conversely, can help disambiguate other ambiguities, e.g. Superscript Data type 12

  13. Deep Learning Models --Relevant to Math-- Embedding (Feature Learning) Converts each math term (or expression) into a numerical feature vector Feedfoward classifiers Good for document classification, & disambiguation between alt. tags Recurrent Neural Networks (RNNs) Input: a variable-length, ordered sequence (sentence, expr., equation) Output: a class, or another sequence (translation or annotation) Advanced RNNs Bidirectional LSTM: learn what to forget, and what to remember RNNs with Attention: Learn what to focus on atany given time 13

  14. Embedding Converts each math entity into a feature vector ~ Embedding each entity as a point in a vector space Similar/related entities map to algebraically similar/related vectors V(king)-V(queen) V(man)-V(woman) V(France)-V(Paris) V(Britain)-V(London) V(cos)-V(arccos) V(exp)-V(log) (??) 14

  15. Schematic of an Embedder (1) -- Input: a single word/symbol -- Numerical Feature Vector of W 100d or 300d 0 0 0 . . . 0 1 0 0 . . . 0 One-hot vector Embedder n1 n2 n3 . . . nk Input W: Word or Symbol Word2Vec GloVe Doc2VEC Etc. The size of the vocabulary. 1 in pos. of W A more meaningful representation of W, used henceforth as input to other models Trained on a dataset of text/math. The set is large/small, generic/specialized 15

  16. Schematic of an Embedder (2) -- Input: tagged word/n-gram -- 0 . 0 1 0 . 0 0 . 0 1 0 . 0 . . . Numerical Feature Vector of input Word / Symbol n1 n2 n3 . . . nk Input W: Tagged Word Tagged Symbol N-gram Embedder tag Can be: tags organically new vector concatenation/mean of individual gram vectors 16

  17. Embedders (Math2Vec) in this Project Datasets: Collections of math papers from the ArXiV, grouped into areas of math The DLMF pages (a single class: special functions) Different embeddings Embedding of individual symbols and words, tagged and untagged Embedding of N-grams, expressions, equations Software Text+math tokenizer: Add-on to the LaTeXML and the POM tagger A variety of fine-tuned, synthesized math embedders 17

  18. Math-tag Disambiguation Alg. Require: Document ? as input; Require: Labeled (i.e., tagged) dataset ?, pre-vectorized with Math2Vec; 1. Get all the embedding vectors ??? for all terms ? in ?; 2. Compute with the math tagger the tentative tags ????? ? , ? ? 3. for each term t in D do i. ? ? ? ???? ? ? ?????(? ?)} ii. ? arg min iii. ???? ? ? ????(? ?) Find in R the terms whose tags could be the tags of t; Select from among those terms the term ? closest to t Adopt the tags of ? to be the tags of t Contexts are embedded with terms KNN is used instead of 1NN in step 3.ii 2 ? ???? ??? More accurate if R is limited to docs of same class as D end for 4. return???? ? ? for all terms ? ?; 18

  19. Alg. for Prime Disambiguation (as in ?) Require: Document D as input, and term-pair ? (in a specific location); Require: Labeled (i.e., tagged) dataset R, pre-vectorized with Math2Vec; 1. Get all the embedding vectors ??for all terms/bigrams in D; 2. Compute with the math tagger the tentative tags ????? ? , ? ? 3. ? Find in R all bigrams ? whose tags could be the tags of ? i.e., ???? ? ? ?????(? ?)} 4. ? arg min 5. ???? ? ? ???? ? ? 2; More accurate if R is limited to docs in the same class as D Larger contexts are embedded with terms KNN is used instead of 1NN in step (4) ? ???? ??? // ? is the closest in ? to ? 19

  20. Retrospective and Prospective The previous approach requires tentative tags and KNN search Can we do better than KNN? Possibly: train traditional classifiers (NN, SVM, RF) to classify each term the classes are the possible tag values the feature vectors are n-gram embeddings Prospective: Can a model be trained to find the definite tags directly? Answer: Probably, using Recurrent Neural Networks 20

  21. Recurrent Neural Networks (RNNs) The input is an variable-lengthsequence Sentence, phrase, n-gram Equation, math expression Note: each entity in the sequence is represented by its embedding RNN Characteristics Remembers something (a state) derived from the subsequence thus far Incorporates that state in the processing of the next term in sequence 21

  22. Schematic of RNNs Input: X(1), X(2), , X(n) Output: O(1), O(2), , O(n) t: term index (time) Each RNN cell (the big circle) has 2 input and 2 output Hidden states: h(t)= (U.X(t)+W.h(t-1)) Output: O(t)= (V.h(t)) O(t)= softmax(V.h(t)) U, V and W are param matrices optimized through training is the sigmoid or 22

  23. LSTM Cells More elaborate than RNN cells Has 3 input, 2 output Has gates to control how much of the previous cell s outputs to use of its output to go to the next cell Let Y(t)= [h(t-1),X(t)] O(t)=O(t-1)* (Wf.Y) + (Wi.Y) *tanh(W.Y) h(t)=O(t)* (Wo.Y) Wf , Wi , W, Woare parameter matrices optimized by training Output of Cell t-1 O(t-1) O(t) * + * * tanh W Wo Wf Wi h(t) h(t-1) Y State of Cell t-1 LSTM Cell t X(t) 23

  24. LSTM without Attention Each word is represented as an embedding vector The bottom state h (a vector) encodes the entire sentence The 2nd column of LSTM cells decodes h into a new sequence Every decoding LSTM cell takes h and the previous decoder state as input to decode the next word 24

  25. LSTM with Attention Each attention model computes a weighted average of the encoder states By carefully adjusting the weights, the attention model decides which of the encoder states to focus on. The weights, at decoder time t, are controlled by the previous state of the decoder, h t-1 25

  26. Use of LSTM-wA in this Project Input: Math expressions and equations Output: The tags / role / meaning of each input term, OR Output: cMML, or CAS program, or formal math The training: Needs large datasets of labeled documents Datasets: The DLMF, with labels produced from current annotations A lot more documents (ArXiV), labeled by community efforts? To the public: the labeled dataset + the trained models 26

  27. Summary (1) Math2Vec embedding of terms, exprs, eqs, and n-grams Are good algebraic/numerical representations Good for search and uncovering relation b/w math entities Can be used for clustering Can be used to train classifiers for various tasks, such as tag disambiguation document classification 27

  28. Summary (2) RNNs, especially BiLSTMs with Attention Can directly tag each input term in a math expression Can directly translate to cMML/CAS/formal math 28

  29. Summary (3) What this project does and will do Collect large datasets of math documents Label some of those documents Adapt and train embedders for math Compute embeddings of math terms Use embeddings for tag disambiguation Adapt and train RNNs w/ attention to directly tag math terms Evaluate and optimize performance of those models Make available to the public: Labeled datasets, embeddings, trained models, software 29

Related