Active Learning in Machine Learning

 
A
c
t
i
v
e
 
L
e
a
r
n
i
n
g
(
A
L
)
 
汇报人:刘颖
 
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
Machine
Learning
 
Supervised
Learning
 
Unsupervised
Learning
 
Semi-supervised
Learning
 
label data
 
unlabel data
 
label data
       &
unlabel data
 
W
h
a
t
 
i
s
 
A
L
?
 
Active learning
(AL)
 
is the subset of machine learning in which a
learning algorithm can query a user interactively to 
label data
 with the
desired outputs.
      
In statistics literature, it is sometimes also called
 query learning(
查询学
)
 
 
optimal experimental design
(
最优实验设计
)
.The
 
information source is
also called teacher or oracle.
 
W
h
y
 
A
L
?
 
attempt to overcome the labeling bottleneck by asking
 
queries
achieve high accuracy using
 
as few labeled instances as
possible
(minimizing the cost of obtaining labeled data)
H
o
w
 
A
L
?
-
-
-
T
h
r
e
e
 
S
e
n
e
r
i
o
s
 
1
 
2
 
3
 
M
e
m
b
e
r
s
h
i
p
 
q
u
e
r
y
 
s
y
n
t
h
s
i
s
 
learner randomly choose
query instance to oracle
to label the data
 
S
t
r
e
a
m
-
b
a
s
e
d
 
S
e
l
e
c
t
i
v
e
 
S
a
m
p
l
i
n
g
 
the learner decide
whether or not to
request its label
 
P
o
o
l
-
b
a
s
e
d
 
S
a
m
p
l
i
n
g
 
massive unlabeled
data in the real world
用实值函数选择
 
H
o
w
 
A
L
?
 
H
o
w
 
A
L
?
 
T
h
e
 
c
o
r
e
 
o
f
 
A
L
-
-
Q
u
e
r
y
 
S
t
r
a
t
e
g
y
 
F
r
a
m
e
w
o
r
k
 
Uncertainty Sampling(
不确定性采样查询
)
Query-By-Committee(QBC)(
基于委员会的查询
)
Expected Model Change(
基于模型变化期望的查询
)
Expected Error Reduction(
基于误差减少的查询
)
Variance Reduction(
基于方差减少的查询
)
Density-Weighted  Methods(
基于密度权重的查询
)
 
U
n
c
e
r
t
a
i
n
t
y
 
S
a
m
p
l
i
n
g
 
1
Least Confident(
置信度最低
)
 
 
2
Margin Samplnig(
边缘采样
)
 
 
3
Entropy(
熵方法
)
 
Q
u
e
r
y
-
B
y
-
C
o
m
m
i
t
t
e
e
(
Q
B
C
)
 
Vote Entropy
(投票熵)
 
 
 
Average Kullback-Leibler Divergence
(平均 KL 散度)
 
D
e
n
s
i
t
y
-
W
e
i
g
h
t
e
d
 
M
e
t
h
o
d
s
 
B
附近的点的信息会大于
A
附近的点
 
E
x
p
e
r
i
m
e
n
t
 
   
400 instances,
sampled from two
 class Gaussians.
 
    
30 actively
queried instances
 
accuracy
90%
 
    
30 
randomly
labeled instances
 
accuracy
70%
 
Logistic regression
 
D
i
f
f
e
r
e
n
c
e
s
 
w
i
t
h
 
S
e
m
i
-
S
u
p
e
r
v
i
s
e
d
 
l
e
a
r
n
i
n
g
 
1
Definition
     actively choose data
2
Processing
     active learning: intro the extra expert knowledge; choose the most misjudge data to
                            oracle
     semi-superviesed: choose the least misjudge data
 
A
p
p
l
i
c
a
t
i
o
n
 
Recommender Systems
NLP
Text classification
Image classification
Slide Note
Embed
Share

Active Learning (AL) is a subset of machine learning where a learning algorithm interacts with a user to label data for desired outputs. It aims to minimize the labeling bottleneck by achieving high accuracy with minimal labeled instances, thus reducing the cost of obtaining labeled data. Techniques like membership query synthesis, stream-based selective sampling, and pool-based sampling play key roles in AL, along with query strategies like Uncertainty Sampling and Query-By-Committee (QBC).

  • Active Learning
  • Machine Learning
  • Query Strategies
  • Uncertainty Sampling
  • Data Labeling

Uploaded on Jul 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Active Learning(AL)

  2. Machine Learning Supervised Learning label data Machine Learning Unsupervised Learning unlabel data label data & unlabel data Semi-supervised Learning

  3. What is AL? Active learning(AL)is the subset of machine learning in which a learning algorithm can query a user interactively to label data with the desired outputs. In statistics literature, it is sometimes also called query learning( ) optimal experimental design( ).The information source is also called teacher or oracle.

  4. Why AL? attempt to overcome the labeling bottleneck by asking queries achieve high accuracy using as few labeled instances as possible(minimizing the cost of obtaining labeled data)

  5. How AL?---Three Senerios 1 2 3

  6. Membership query synthsis learner randomly choose query instance to oracle to label the data

  7. Stream-based Selective Sampling the learner decide whether or not to request its label

  8. Pool-based Sampling massive unlabeled data in the real world

  9. How AL?

  10. How AL?

  11. The core of AL--Query Strategy Framework Uncertainty Sampling( ) Query-By-Committee(QBC)( ) Expected Model Change( ) Expected Error Reduction( ) Variance Reduction( ) Density-Weighted Methods( )

  12. Uncertainty Sampling 1 Least Confident( ) 2 Margin Samplnig( ) 3 Entropy( ) 333438303935353b333633373435383bbcfdcdb7

  13. Query-By-Committee(QBC) Vote Entropy( ) Average Kullback-Leibler Divergence( KL ) 333438303935353b333633373435383bbcfdcdb7

  14. Density-Weighted Methods B A

  15. Experiment Logistic regression 400 instances, sampled from two class Gaussians. 30 actively queried instances 30 randomly labeled instances accuracy 90% accuracy 70%

  16. Differences with Semi-Supervised learning 1 Definition actively choose data 2 Processing active learning: intro the extra expert knowledge; choose the most misjudge data to oracle semi-superviesed: choose the least misjudge data

  17. Application Recommender Systems NLP Text classification Image classification

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#