ELECTRA: Pre-Training Text Encoders as Discriminators

 
ELECTRA: PRE-TRAINING TEXT ENCODERS AS
DISCRIMINATORS RATHER THAN GENERATORS
  from:ICLR 2020
 
Paper Reading
 
汇报人
韩佳乘
    
1801210827
 
Outline
 
Motivation
Introduction
Method
E
xpreiments
C
onclusion
 
Motivation
 
Compute efficient
 
Performance better
 
Introduction
 
What is ELECTRA?
  
E
fficiently 
L
earning an 
E
ncoder that 
C
lassifies 
T
oken 
R
eplacements 
A
ccurately.
 
Main Idea
Replaced token detection & GAN thinking
replace some input tokens with samples from 
generator 
instead of masking
pre-train 
discriminator 
to predict for every token whether it is an original or a replacement
 
 
 
Method
 
Method
 
Method
 
其中, x 的mask生成和被替换过程为:
 
判别器判断每个位置的
t
oken是否是由生成器替换掉过,还是没有变化
 
Method
 
生成器损失函数:
 
判别器的损失函数:
 
优化联合损失函数:
 
E
xpreiments—small models
 
E
xpreiments—
large models
 
E
xpreiments—H
ow to choose generator size?
 
C
onclusion
 
The key idea is training a text encoder to distinguish input tokens from high-
quality negative samples produced by an small generator network
 
Compared to masked language modeling, 
ELECTRA 
is more compute-efficient and
results in better performance on downstream tasks
 
 
Slide Note
Embed
Share

Efficiently learning an encoder that classifies token replacements accurately using ELECTRA method, which involves replacing some input tokens with samples from a generator instead of masking. The key idea is to train a text encoder to distinguish input tokens from negative samples, resulting in better downstream task performance compared to masked language modeling.

  • ELECTRA
  • Text Encoders
  • Discriminators
  • Pre-Training
  • Token Replacements

Uploaded on Jul 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Paper Reading ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS from:ICLR 2020 1801210827

  2. Outline Motivation Introduction Method Expreiments Conclusion

  3. Motivation Compute efficient Performance better

  4. Introduction What is ELECTRA? Efficiently Learning an Encoder that Classifies Token Replacements Accurately. Main Idea Replaced token detection & GAN thinking replace some input tokens with samples from generator instead of masking pre-train discriminator to predict for every token whether it is an original or a replacement

  5. Method

  6. Method G, D tokens: ? = [?1,...,??] Token (?) = [ 1,..., ?] t,G ?? softmax

  7. Method x mask token

  8. Method

  9. Expreimentssmall models

  10. Expreimentslarge models

  11. ExpreimentsHow to choose generator size?

  12. Conclusion The key idea is training a text encoder to distinguish input tokens from high- quality negative samples produced by an small generator network Compared to masked language modeling, ELECTRA is more compute-efficient and results in better performance on downstream tasks

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#