Semi-Supervised Learning: Combining Labeled and Unlabeled Data

 
CS 678 - Ensembles and Bayes
 
1
 
Semi-Supervised Learning
 
Can we improve the quality of our learning by combining
labeled and unlabeled data
Usually a lot more unlabeled data available than labeled
Assume a set 
L
 of labeled data and 
U
 of unlabeled data
(from the same distribution)
Focus on Semi-Supervised Classification though there are
many other variations
Aiding clustering with some labeled data
Regression
Model selection with unlabeled data (COD)
Transduction vs Induction
 
How Semi-Supervised Works
 
Most approaches make strong model assumptions
(guesses).  If wrong can make things worse.
Some commonly used assumptions:
Clusters of data are from the same class
Data can be represented as a mixture of parameterized distributions
Decision boundaries should go through non-dense areas of the data
Model should be as simple as possible (Occam)
 
 
CS 678 - Ensembles and Bayes
 
2
Unsupervised Learning of Domain
Features
 
PCA, SVD
NLDR – Non-Linear Dimensionality Reduction
Many Deep Learning Models
Deep Belief Nets
Sparse Auto-encoders
Self-Taught Learning
 
CS 678 - Ensembles and Bayes
 
3
 
Deep Net with Greedy Layer Wise Training
 
Adobe – Deep Learning and Active Learning
 
4
 
ML Model
 
New Feature Space
 
Original Inputs
 
Supervised
Learning
 
Unsupervised
Learning
 
Self-Training (Bootstrap)
 
Self-Training
Train supervised model on labeled data 
L
Test on unlabeled data 
U
Add the most confidently classified members of 
U
 to 
L
Repeat
Multi-Model
Uses multiple models to label/move instances of 
U
 to 
L
Co-Training
Train two models with different independent features sets
Add most confident instances from 
U
 of one model into 
L
 of the other (i.e.
they “teach” each other)
Repeat
Multi-View Learning
Train multiple diverse models on 
L
.  Those instances in 
U
 which most
models agree on are placed in 
L
.
 
CS 678 - Ensembles and Bayes
 
5
 
Generative Models
 
Generative – Assume data can be represented by some
mixture of parameterized models (e.g. Gaussian) and use
EM to learn parameters (ala Baum-Welch)
 
CS 678 - Ensembles and Bayes
 
6
 
Graph Models
 
Graph Models
Neighbor nodes assumed to be similar with larger edge weights
Force same class member in 
L
 to be close, while maintaining
smoothness with respect to the graph for 
U
.
Add in members of 
U
 as neighbors based on some similarity
Iteratively label 
U
 (breadth first)
 
CS 678 - Ensembles and Bayes
7
 
TSVM
 
Transductive SVM (TSVM) or Semi-Supervised SVM
(S3VM)
Maximize margin of both 
L
 and 
U
.  Decision surface
placed in non-dense spaces
Assumes classes are "well-separated"
Can also try to simultaneously maintain class proportion on both
sides similar to labeled proportion
 
CS 678 - Ensembles and Bayes
8
 
Summary
 
Oracle Learning
Becoming a more critical area as more unlabeled data
becomes cheaply available
 
CS 678 - Ensembles and Bayes
 
9
 
Active Learning
 
Obtaining labeled data can be the most expensive part of a
machine learning task
Supervised, Unsupervised, and Semi-Supervised Learning
In Active Learning can query an oracle (e.g. a human
expert, test, etc.) to obtain the label for a specific input
In active learning we try to learn the most accurate model
while having to query the least amount of data for labels
 
Adobe – Deep Learning and Active Learning
 
10
 
Active Learning
 
Adobe – Deep Learning and Active Learning
 
11
 
Often query:
1)
A low confidence instance (i.e. near a decision boundary)
2)
An instance which is in a relatively dense neighborhood
 
Active Learning
 
Adobe – Deep Learning and Active Learning
 
12
 
Often query:
1)
A low confidence instance (i.e. near a decision boundary)
2)
An instance which is in a relatively dense neighborhood
 
Active Learning
 
Adobe – Deep Learning and Active Learning
 
13
 
Often query:
1)
A low confidence instance (i.e. near a decision boundary)
2)
An instance which is in a relatively dense neighborhood
 
Active Learning
 
Adobe – Deep Learning and Active Learning
 
14
 
Often query:
1)
A low confidence instance (i.e. near a decision boundary)
2)
An instance which is in a relatively dense neighborhood
 
Active Clustering
Images (Objects, Words, etc.)
 
First do unsupervised clustering
Which points to show an expert in order to get feedback on
the clustering to allow adjustment?
 
Adobe – Deep Learning and Active Learning
 
15
 
Active Clustering
Images (Objects, Words, etc.)
 
First do unsupervised clustering
Which points to show an expert in order to get feedback on
the clustering to allow adjustment?
 
Adobe – Deep Learning and Active Learning
 
16
 
Active Clustering
Images (Objects, Words, etc.)
 
First do unsupervised clustering
Which points to show an expert in order to get feedback on
the clustering to allow adjustment?
 
Adobe – Deep Learning and Active Learning
17
Slide Note

Transduction, you are given a set of points, some labeled and others not. Goal is to infer the output for the unlabeled points, without worrying about any later generalization. Differs from induction which tries to generalize on heretofore unseen arbitrary points based on the labeled points. You know where all the unlabeled points are and you can use at least that knowledge in making the decision. A variation of semi-supervised.

Embed
Share

In semi-supervised learning, we aim to enhance learning quality by leveraging both labeled and unlabeled data, considering the abundance of unlabeled data. This approach, particularly focused on semi-supervised classification, involves making model assumptions such as data clustering, distribution representation, decision boundary placement, and model simplicity. Various techniques like self-training, co-training, and generative models are utilized to improve classification accuracy using a combination of labeled and unlabeled data.

  • Semi-Supervised Learning
  • Data Combination
  • Model Assumptions
  • Classification Techniques
  • Generative Models

Uploaded on Jul 15, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot more unlabeled data available than labeled Assume a set L of labeled data and U of unlabeled data (from the same distribution) Focus on Semi-Supervised Classification though there are many other variations Aiding clustering with some labeled data Regression Model selection with unlabeled data (COD) Transduction vs Induction CS 678 - Ensembles and Bayes 1

  2. How Semi-Supervised Works Most approaches make strong model assumptions (guesses). If wrong can make things worse. Some commonly used assumptions: Clusters of data are from the same class Data can be represented as a mixture of parameterized distributions Decision boundaries should go through non-dense areas of the data Model should be as simple as possible (Occam) CS 678 - Ensembles and Bayes 2

  3. Unsupervised Learning of Domain Features PCA, SVD NLDR Non-Linear Dimensionality Reduction Many Deep Learning Models Deep Belief Nets Sparse Auto-encoders Self-Taught Learning CS 678 - Ensembles and Bayes 3

  4. Deep Net with Greedy Layer Wise Training Supervised Learning ML Model New Feature Space Unsupervised Learning Original Inputs Adobe Deep Learning and Active Learning 4

  5. Self-Training (Bootstrap) Self-Training Train supervised model on labeled data L Test on unlabeled data U Add the most confidently classified members of U to L Repeat Multi-Model Uses multiple models to label/move instances of U to L Co-Training Train two models with different independent features sets Add most confident instances from U of one model into L of the other (i.e. they teach each other) Repeat Multi-View Learning Train multiple diverse models on L. Those instances in U which most models agree on are placed in L. CS 678 - Ensembles and Bayes 5

  6. Generative Models Generative Assume data can be represented by some mixture of parameterized models (e.g. Gaussian) and use EM to learn parameters (ala Baum-Welch) CS 678 - Ensembles and Bayes 6

  7. Graph Models Graph Models Neighbor nodes assumed to be similar with larger edge weights Force same class member in L to be close, while maintaining smoothness with respect to the graph for U. Add in members of U as neighbors based on some similarity Iteratively label U (breadth first) CS 678 - Ensembles and Bayes 7

  8. TSVM Transductive SVM (TSVM) or Semi-Supervised SVM (S3VM) Maximize margin of both L and U. Decision surface placed in non-dense spaces Assumes classes are "well-separated" Can also try to simultaneously maintain class proportion on both sides similar to labeled proportion CS 678 - Ensembles and Bayes 8

  9. Summary Oracle Learning Becoming a more critical area as more unlabeled data becomes cheaply available CS 678 - Ensembles and Bayes 9

  10. Active Learning Obtaining labeled data can be the most expensive part of a machine learning task Supervised, Unsupervised, and Semi-Supervised Learning In Active Learning can query an oracle (e.g. a human expert, test, etc.) to obtain the label for a specific input In active learning we try to learn the most accurate model while having to query the least amount of data for labels Adobe Deep Learning and Active Learning 10

  11. Active Learning Often query: 1) A low confidence instance (i.e. near a decision boundary) 2) An instance which is in a relatively dense neighborhood Adobe Deep Learning and Active Learning 11

  12. Active Learning Often query: 1) A low confidence instance (i.e. near a decision boundary) 2) An instance which is in a relatively dense neighborhood Adobe Deep Learning and Active Learning 12

  13. Active Learning Often query: 1) A low confidence instance (i.e. near a decision boundary) 2) An instance which is in a relatively dense neighborhood Adobe Deep Learning and Active Learning 13

  14. Active Learning Often query: 1) A low confidence instance (i.e. near a decision boundary) 2) An instance which is in a relatively dense neighborhood Adobe Deep Learning and Active Learning 14

  15. Active Clustering Images (Objects, Words, etc.) First do unsupervised clustering Which points to show an expert in order to get feedback on the clustering to allow adjustment? Adobe Deep Learning and Active Learning 15

  16. Active Clustering Images (Objects, Words, etc.) First do unsupervised clustering Which points to show an expert in order to get feedback on the clustering to allow adjustment? Adobe Deep Learning and Active Learning 16

  17. Active Clustering Images (Objects, Words, etc.) First do unsupervised clustering Which points to show an expert in order to get feedback on the clustering to allow adjustment? Adobe Deep Learning and Active Learning 17

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#