Conceptualization in Machine Learning

 
Really learning concepts
 
CS786
21
st
 April 2022
 
Summary
 
We have discussed that there are two types of
representations
Propositional
Non-propositional
Identified the central role of similarity in placing
stimuli into categories
Seen how visual and symbolic stimuli can be
placed into categories
Seen how this categorization can be
accomplished in supervised and unsupervised
settings
 
Conceptualization
 
Is not just classification
Is not just clustering
It lets humans do a whole lot more
 
(Lake, Salakhutdinov & Tenenbaum, 2015)
 
One-shot classification
 
Humans can do this with about 4% error; leading machine vision algorithms can
now do this with about 2% error 
(Lake, Salakhutdinov & Tenenbaum, 2019)
 
Generation
 
Parsing
 
Composition
 
Bayesian program learning
 
An algorithmic approach
that allows all these
capabilities
In the restricted domain
of handwritten visual
symbols
Built around three key
ideas
Compositionality
Causality
Learning to learn
 
Approach
 
Sampled handwritten characters from various
scripts
AMT workers were asked to reproduce
characters using a mouse
Tracked mouse movements throughout
drawing
Omniglot dataset
  mouse trajectory data and final images
 
Identifying motor primitives in writing
 
Pen trajectories were normalized in time
50 ms sampling interval
If a pen moved less than one pixel between
two time points
Mark as a pause
Define sub-strokes as segments extracted
between pairs of pauses
 
Identifying motor primitives in writing
 
All sub-stroke trajectories normalized for point
density and size
Too small sub-strokes removed
Remaining sub-strokes fit with a spline
Represented by five 10-dimensional control points
A diagonal GMM with 1250 components was
fit to this data
Each mixture component is a motor primitive
 
Approach
 
Generate types of symbols using probabilistic programs; each type becomes a
generative model for subsequent tokens
 
Type generation
 
Samples drawn from empirical distributions obtained from the Omniglot dataset
 
Approach
 
Token generation
 
All stochastic elements learned from Omniglot dataset
 
The generative model
 
Given types 
ψ
 , M tokens of each type 
θ
 and
M images I corresponding to these tokens, we
learn a generative model that factors the joint
distribution
 
Type generation
 
Token generation
 
Image generation
 
Inference
 
Can infer token given image
Generate multiple candidates
Use MAP estimate to select the best ones
 
Other tasks
 
Generating new examples
Run the generateToken program trained on target
dataset
 Parsing
Run the generateType and generateToken
programs trained on target dataset
Composition
Place a non-parametric prior on the types
 
Composition
 
Comparison
 
Take home message
 
Concepts are learned by acting upon the
world
Computational methods are only just
beginning to understand how to accomplish
this
This is a very exciting future direction in AI
research
Slide Note
Embed
Share

Discussion on two types of representations (Propositional, Non-propositional) and the role of similarity in categorizing stimuli. Exploring supervised and unsupervised categorization methods, along with the capabilities of conceptualization beyond classification and clustering. Comparison of human and machine performance in one-shot classification. Introduction to Bayesian program learning and an algorithmic approach for learning handwritten visual symbols through compositionality, causality, and learning to learn.

  • Machine Learning
  • Conceptualization
  • Categorization
  • Bayesian Program Learning
  • Human-Machine Comparison

Uploaded on Sep 20, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Really learning concepts CS786 21stApril 2022

  2. Summary We have discussed that there are two types of representations Propositional Non-propositional Identified the central role of similarity in placing stimuli into categories Seen how visual and symbolic stimuli can be placed into categories Seen how this categorization can be accomplished in supervised and unsupervised settings

  3. Conceptualization Is not just classification Is not just clustering It lets humans do a whole lot more (Lake, Salakhutdinov & Tenenbaum, 2015)

  4. One-shot classification Humans can do this with about 4% error; leading machine vision algorithms can now do this with about 2% error (Lake, Salakhutdinov & Tenenbaum, 2019)

  5. Generation

  6. Parsing

  7. Composition

  8. Bayesian program learning An algorithmic approach that allows all these capabilities In the restricted domain of handwritten visual symbols Built around three key ideas Compositionality Causality Learning to learn

  9. Approach Sampled handwritten characters from various scripts AMT workers were asked to reproduce characters using a mouse Tracked mouse movements throughout drawing Omniglot dataset mouse trajectory data and final images

  10. Identifying motor primitives in writing Pen trajectories were normalized in time 50 ms sampling interval If a pen moved less than one pixel between two time points Mark as a pause Define sub-strokes as segments extracted between pairs of pauses

  11. Identifying motor primitives in writing All sub-stroke trajectories normalized for point density and size Too small sub-strokes removed Remaining sub-strokes fit with a spline Represented by five 10-dimensional control points A diagonal GMM with 1250 components was fit to this data Each mixture component is a motor primitive

  12. Approach Generate types of symbols using probabilistic programs; each type becomes a generative model for subsequent tokens

  13. Type generation Samples drawn from empirical distributions obtained from the Omniglot dataset

  14. Approach

  15. Token generation All stochastic elements learned from Omniglot dataset

  16. The generative model Given types , M tokens of each type and M images I corresponding to these tokens, we learn a generative model that factors the joint distribution Type generation Image generation Token generation

  17. Inference Can infer token given image Generate multiple candidates Use MAP estimate to select the best ones

  18. Other tasks Generating new examples Run the generateToken program trained on target dataset Parsing Run the generateType and generateToken programs trained on target dataset Composition Place a non-parametric prior on the types

  19. Composition

  20. Comparison

  21. Take home message Concepts are learned by acting upon the world Computational methods are only just beginning to understand how to accomplish this This is a very exciting future direction in AI research

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#