Knowledge Editing for Large Language Models

Knowledge Editing for Large Language Models
Jun
-Y
u Ma
National Engineering Research Center of Speech and Language Information Processing,
University of Science and Technology of China, Hefei, China
3
/
16
/202
4
1
Presenter
2017.09 - 2021.06, Bachelor, USTC
2021.09 - Present, Ph.D. student, USTC
Main interests:
1.
Information
 
Extraction
2.
Multilinguality
3.
Model Editing
Jun-Yu Ma
Phd@ustc
Hallucinations in Large Language Models
LLMs inevitably exhibit hallucinations
D
eviate from user input
D
eviate from generated context
Error f
actual knowledge
significantly 
undermines
 the reliability of LLMs
3
Ways to Mitigate Hallucinations
Supervised Fine-tuning:
Simple and effective
D
ifficult to obtain enough 
high-
quality
 data
Easy
 
to
 
overfit
 
&
 
affect
 
other
knowledge
More
 
computing resources
Retrieval-augmentation:
Individualization
 
on private data
Low
 
training and inference 
costs
I
ntroduce 
retrieval noise
Short-term
 
change
     LLM
    Fixer
     LLM
Model Editing:
Precise
 
control
Low
 
computing
 costs
Difficult to 
scalable
May 
affect other abilities
active research!
4
Model Editing: Definition and Evaluation
Efficiently change model behavior without affecting other inputs
Given an 
Edit
 
sample
5
 
Generalization
:
 
recall the fact under the
 
in-
scope
 paraphrase prompts 
𝐼(𝓍
𝑒
)
E.g.,
 
𝓍
𝑖𝑛
 
-
 
Who currently holds the office of President of the United States?
Locality
: 
remain unchanged for the 
prompts out of the editing scope 
𝑂(𝓍
𝑒
)
E.g.,
 
𝓍
𝑜𝑢𝑡
 
-
 
Who
 
is
 
the
 
president
 
of
 France
?
Current State of Model Editing
Preserve
 
parameters
 
by integrating an 
auxiliary network
Modify
 
parameters
 
directly
 
responsible for undesirable output
Yunzhi Yao
, et al.
 Editing Large Language Models: Problems, Methods, and Opportunities. 
EMNLP 2023.
6
Where is Knowledge Stored in LLMs?
7
Mor Geva
, et al.
 Editing Transformer Feed-Forward Layers Are Key-Value Memories. 
EMNLP 2021.
Damai Dai
, et al.
 Editing Knowledge Neurons in Pretrained Transformers. 
ACL 2022.
Kevin Meng, et al. 
Locating and Editing Fact Associations in GPT
. NeurIPS 2022.
 
Untying the Reversal Curse via
Bidirectional Language Model Editing
8
The
 
Reversal Curse
What influences the likelihood of B?
Examples that match the order 
(“A
precedes B”) 
are far more
influential than examples with
reverse order 
(“B precedes A”)
Lukas Berglund
, et al.
 The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A". 
2023.
    LLM
A
B
C
9
Previous
 
unidirectional
 
benchmarks and
approaches have failed to explore the
reversal curse
 
Single-hop
 factual question
 
Multi-hop
 factual question
10
Bidirectional LM Editing
11
Evaluation forms of 
question answering 
and
 
judgement
Reversibility
BAKE: Bidirectional Assessment for Knowledge Editing
12
Massive
 
factual
 
triples
 (
subject, relation, object
) 
based on Wikidata
Two sub-datasets: BAKE-Q&J and BAKE-J
BAKE-Q&J (25 one-to-one and one-to-many)
BAKE-J (20 many-to-one and many-to-many)
Handwritten templates for bidirectional relations
BAKE: 
Example and Evaluation
Efficacy:
Generalization:
Locality:
Reversibility-
Judgement:
Reverse-QA 
Score
 (RQS):
Reverse-Judgment 
Score
 (RJS) :
Examples:
Reverse-QA 
Magnitude
 (RQM) and
Reverse-Judgment 
Magnitude
 (RJM)
measure the probability difference
Metrics:
Reversibility-QA:
13
Evaluation of Current Editing Methods
Perform well 
in the
editing direction
 
Suffer serious
deficiencies 
in the
reverse direction
 
Gradient-based
 (FT,
MEND) 
perform worse
than editing-based (KN,
MEMIT, ROME) in the
reverse direction
14
BIRD: 
Bidirectionally Inversible Relationship moDeling
   Static Word2Vec property:
Extend
to 
the
factual
triples
Conceptualize
France
Paris
England
London
has capital
Dynamic l
anguage model:
15
Enhance
 the association of the 
NEW
 fact bidirectionally
R(
subject
) + R(
forward
 
relation
)  is driven close to R(
object_new
)
R(
object_new
) + R(
backward
 
relation
) is driven close to R(
subject
)
London
England
Paris
has capital
England has the capital 
London
 
Paris
is capital of
Enhance
BIRD: 
Bidirectionally Inversible Relationship moDeling
16
Weaken
 the association of the 
ORIGINAL
 fact bidirectionally
R(
subject
) + R(
forward
 
relation
)  is driven away from R(
object_orig
)
R(
object_orig
) + R(
backward
 
relation
) is driven away from R(
subject
)
New knowledge
Original knowledge
Finally
London
England
Paris
has capital
England has the capital 
London
 
Paris
is capital of
Weaken
BIRD: 
Bidirectionally Inversible Relationship moDeling
17
Significantly improve the editing performance of four LLMs in the
reverse direction
18
Results of BIRD
Comparison with Other Metrics
19
Portability
: 
whether can transfer editing knowledge to related content
Multi-hop
: 
whether can answer multi-hop questions entailed by the
editing facts
Reversibility is indeed more challenging!
Yunzhi Yao
, et al.
 Editing Large Language Models: Problems, Methods, and Opportunities. 
EMNLP 2023.
Zexuan Zhong, et al. 
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions.
 EMNLP 2023.
Log Probability of Desired Outputs
Current editing methods:
fail
 to increase the probability of 
the desired answer after editing
fail
 to decrease 
the probability of 
the original answer
fail
 to 
decrease 
the margin between 
original answer &
 desired answer
The 
closer
 to 0,
the 
greater
 the
probability
20
Effect of Objective Weights
α
: 
Incorporating the
 
bidirectionally inversible relationships 
is effective
        
s
hould 
not be too large 
to affect the memorization of new facts
β
: 
Weakening
 
the association of the 
original
 
fact is necessary
New knowledge
Original knowledge
Neighboring Perturbations of Knowledge Editing
on Large Language Models
22
Neighboring Perturbations of Knowledge Editing
Motivation:
1. 
Prior works mainly focus on what should
change after editing (if target knowledge
has been memorized)
 
Question: 
Whether 
the editing operation of
appending a new answer 
into an answer list to
a question 
perturbs the neighboring knowledge
encapsulated within them
?
23
 
catastrophic 
forgetting of original correct answers
 
unintentional 
inclusion of incorrect answers
Additivity
Metric:  
the 
Additivity
 is introduced to assess the degree of perturbation to
neighboring knowledge
Given question           ,  true answer list                     ,  ,     ,  false answer list                                ,
a new answer to append
 
24
 
Relative ranking of objects
:  the minimum probability of 
correct answers 
should be
larger
 than the maximum probability of 
false answers 
before and after editing
 
 
Ranking Forgetting Factor:
 
 
Ranking Noising Factor:
Additivity
 
 
 
Absolute probability change of objects
: even 
relative ranking 
unchanged,
substantial harm is inflicted upon the model if the absolute probability changes
unexpectedly
     1. if prob of 
correct answers 
decrease
 
     2. if prob of 
false answers 
increase
 
Aggregation:
 
      
Additive Forgetting Factor:
      
Additive Noising Factor
:
25
PEAK: Perturbation Evaluation of Appending Knowledge
1.
Based on 
Wikidata & YAGO
, 
use 
fact triples
 of forms (subject, relation, object)
2.
Two datasets:
PEAK
-CF (the new answer is counterfact)
PEAK-T (the new answer is 
factually correct
 
and occurs after the original model)
3.
Sampling false answers
Hard
 (related to new appended answer)
Random
 (
semantically distant
 to new answer
)
26
APP: 
Appending via Preservation and Prevention
Question: Who has been the president of US?
Maintain
 a certain margin between the probabilities of
original correct answers 
O
 
and 
false answers 
O
h
Washington
margin
Trump
Probability
Harris
Ron Klain
Adams
Blinken
Correct
False
27
APP: 
Appending via Preservation and Prevention
Maintain margin
Control probability changes
E
nsuring
 
the probabilities of 
correct answers 
do not
decrease
 while those of 
false answers 
do not 
increase
Finally
Washington
Probability
Harris
Ron Klain
Adams
Washington
Adams
During editing
Before editing
Harris
Ron Klain
Appending
Question: Who has been the president of US?
28
Evaluation on 
PEAK-CF dataset
29
 
Existing methods perform well 
in memorizing new knowledge
 
They seriously 
disrupt the integrity of original correct knowledge
 
and 
introduce
unintentional false knowledge
 
Edited models generally 
show worse performance 
under the 
Hard
 setting than
those under the 
Random
 setting in terms of AFF and ANF
 
APP
 significantly 
mitigates the neighboring perturbations 
of different methods
on different LLMs
Evaluation on 
PEAK-T dataset
Comparing PEAK-CF with PEAK-T:
PEAK-T is more challenging for existing methods to 
append knowledge
PEAK-T suffers 
fewer neighboring perturbations 
during editing than PEAK-CF
30
Probability of Answers
 
Current editing methods:
Existing editing methods severely 
perturb probabilities
 
APP significantly 
mitigated
 the probability perturbations, especially for ROME & MEMIT
 
Correct answers 
decrease
False answers 
increase
31
Ablation study
Removing each editing objective L1, L2 or L3 in APP:
Removing any editing objective of APP lead to performance degradation in
terms of additivity and probability perturbations
Removing L1 result in the most significant performance degradation
32
Effect of Number of Neighboring Answers
Extend the evaluation and consider utilizing k correct
and false answers in APP, where k∈[0,1,3,5,all]:
The performance of all editing methods coupled
with APP is significantly improved, as the number
of 
neighboring answers 
increasing
F
ewer answers still can helps a lot
33
How to solve the “reversal curse” in QA form?
How to enhance the scalability of model editing?
How to accelerate model ed
i
ting ? (ROME, 10000 instances
13h
)
Cross-lingual model editing
Improve the robustness of model editing
Challenges & 
Future
 
Direction
34
Side Effects of Knowledge Editing
Model Editing Can Hurt General Abilities of Large Language Models
, Jia-Chen Gu et.al.
 
(Arxiv 2024)
Question: model editing inherently improves the factuality
of the model, but may come at the cost of a significant
degradation of these general abilities.
Current editing methods unintentionally
hurt the general abilities of LLMs no
matter in instance- or batch-editing
The difficulty in
 not being robust to
weight perturbations 
lies in the dual
objective of
 
improving model factuality
while simultaneously 
maintaining their
general abilities
The side effects are analyzed by systematically evaluating
four popular editing methods on three LLMs covering
eight representative tasks
35
Q & A
Github: https://github.com/mjy1111
Email: mjy1999@mail.ustc.edu.cn
All slides will be put on 
http://home.ustc.edu.cn/~mjy1999/
36
Slide Note
Embed
Share

Knowledge editing for large language models focuses on addressing hallucinations and errors in generated content by modifying model behavior without affecting other inputs. Techniques such as LLM fixer, model editing, and supervised fine-tuning aim to mitigate these issues. Recent research explores ways to locate and edit knowledge stored within LLMs for improved performance.

  • Language Modeling
  • Knowledge Editing
  • Large Models
  • Model Behavior
  • Data Processing

Uploaded on Apr 16, 2024 | 6 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Knowledge Editing for Large Language Models Jun-Yu Ma National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, China 3/16/2024 1

  2. Presenter 2017.09 - 2021.06, Bachelor, USTC 2021.09 - Present, Ph.D. student, USTC Main interests: 1. Information Extraction Jun-Yu Ma Phd@ustc 2. Multilinguality 3. Model Editing

  3. Hallucinations in Large Language Models LLMs inevitably exhibit hallucinations Deviate from user input Deviate from generated context Error factual knowledge significantly undermines the reliability of LLMs 3

  4. Ways to Mitigate Hallucinations LLM Fixer LLM Retrieval-augmentation: Individualization on private data Low training and inference costs Introduce retrieval noise Short-termchange Model Editing: Precise control Low computing costs Difficult to scalable May affect other abilities Supervised Fine-tuning: Simple and effective Difficult to obtain enough high- quality data Easy to overfit & affect other knowledge Morecomputing resources active research! 4

  5. Model Editing: Definition and Evaluation Efficiently change model behavior without affecting other inputs Given an Edit sample Generalization: recall the fact under the in-scope paraphrase prompts ?(??) E.g., ???- Who currently holds the office of President of the United States? Locality: remain unchanged for the prompts out of the editing scope ?(??) E.g., ????- Who is the president of France? 5

  6. Current State of Model Editing Preserve parameters by integrating an auxiliary network Modify parameters directly responsible for undesirable output 6 Yunzhi Yao, et al. Editing Large Language Models: Problems, Methods, and Opportunities. EMNLP 2023.

  7. Where is Knowledge Stored in LLMs? Mor Geva, et al. Editing Transformer Feed-Forward Layers Are Key-Value Memories. EMNLP 2021. Damai Dai, et al. Editing Knowledge Neurons in Pretrained Transformers. ACL 2022. 7 Kevin Meng, et al. Locating and Editing Fact Associations in GPT. NeurIPS 2022.

  8. Untying the Reversal Curse via Bidirectional Language Model Editing 8

  9. The Reversal Curse What influences the likelihood of B? Examples that match the order ( A precedes B ) are far more influential than examples with reverse order ( B precedes A ) B C LLM A 9 Lukas Berglund, et al. The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A". 2023.

  10. Bidirectional LM Editing Previous unidirectional benchmarks and approaches have failed to explore the reversal curse Single-hop factual question Multi-hop factual question Can edited models recall the editing facts in the reverse direction? 10

  11. Reversibility Evaluation forms of question answering and judgement Alternative outputs after inverting many-to-one and many-to-many relations e.g., Shakespeare has written? QA Judge One definite output after inverting one-to-one and one-to-many relations e.g., iPhone is developed by? QA Judge 11

  12. BAKE: Bidirectional Assessment for Knowledge Editing Massive factual triples (subject, relation, object) based on Wikidata Two sub-datasets: BAKE-Q&J and BAKE-J BAKE-Q&J (25 one-to-one and one-to-many) BAKE-J (20 many-to-one and many-to-many) Handwritten templates for bidirectional relations 12

  13. BAKE: Example and Evaluation Examples: Metrics: Reverse-QA Score (RQS): Efficacy: Generalization: Reverse-Judgment Score (RJS) : Locality: Reverse-QA Magnitude (RQM) and Reversibility-QA: Reverse-Judgment Magnitude (RJM) Reversibility- Judgement: measure the probability difference 13

  14. Evaluation of Current Editing Methods Perform well in the editing direction Suffer serious deficiencies in the reverse direction Gradient-based (FT, MEND) perform worse than editing-based (KN, MEMIT, ROME) in the reverse direction 14

  15. BIRD: Bidirectionally Inversible Relationship moDeling Static Word2Vec property: Dynamic language model: Conceptualize Paris London Extend to the factual triples has capital France England 15

  16. BIRD: Bidirectionally Inversible Relationship moDeling Enhance the association of the NEW fact bidirectionally England has the capital London Paris R(subject) + R(forward relation) is driven close to R(object_new) Paris is capital of has capital Enhance England R(object_new) + R(backward relation) is driven close to R(subject) London 16

  17. BIRD: Bidirectionally Inversible Relationship moDeling Weaken the association of the ORIGINAL fact bidirectionally England has the capital London Paris R(subject) + R(forward relation) is driven away from R(object_orig) Paris is capital of has capital Weaken R(object_orig) + R(backward relation) is driven away from R(subject) England London Finally Original knowledge New knowledge 17

  18. Results of BIRD Significantly improve the editing performance of four LLMs in the reverse direction 18

  19. Comparison with Other Metrics Portability: whether can transfer editing knowledge to related content Multi-hop: whether can answer multi-hop questions entailed by the editing facts Reversibility is indeed more challenging! Yunzhi Yao, et al. Editing Large Language Models: Problems, Methods, and Opportunities. EMNLP 2023. 19 Zexuan Zhong, et al. MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions. EMNLP 2023.

  20. Log Probability of Desired Outputs The closer to 0, the greater the probability Current editing methods: fail to increase the probability of the desired answer after editing fail to decrease the probability of the original answer fail to decrease the margin between original answer & desired answer 20

  21. Effect of Objective Weights Original knowledge New knowledge : Incorporating the bidirectionally inversible relationships is effective should not be too large to affect the memorization of new facts : Weakening the association of the original fact is necessary

  22. Neighboring Perturbations of Knowledge Editing on Large Language Models 22

  23. Neighboring Perturbations of Knowledge Editing Motivation: 1. Prior works mainly focus on what should change after editing (if target knowledge has been memorized) Question: Whether the editing operation of appending a new answer into an answer list to a question perturbs the neighboring knowledge encapsulated within them? catastrophic forgetting of original correct answers unintentional inclusion of incorrect answers 23

  24. Additivity Metric: the Additivity is introduced to assess the degree of perturbation to neighboring knowledge Given question , true answer list , , , false answer list , a new answer to append Relative ranking of objects: the minimum probability of correct answers should be larger than the maximum probability of false answers before and after editing Ranking Forgetting Factor: Ranking Noising Factor: 24

  25. Additivity Absolute probability change of objects: even relative ranking unchanged, substantial harm is inflicted upon the model if the absolute probability changes unexpectedly 1. if prob of correct answers decrease 2. if prob of false answers increase Aggregation: Additive Forgetting Factor: Additive Noising Factor: 25

  26. PEAK: Perturbation Evaluation of Appending Knowledge 1. Based on Wikidata & YAGO, use fact triples of forms (subject, relation, object) 2. Two datasets: PEAK-CF (the new answer is counterfact) PEAK-T (the new answer is factually correct and occurs after the original model) 3. Sampling false answers Hard (related to new appended answer) Random (semantically distant to new answer) 26

  27. APP: Appending via Preservation and Prevention Question: Who has been the president of US? Maintain a certain margin between the probabilities of original correct answers O and false answers Oh Correct False Probability Adams Washington Trump Blinken margin Ron Klain Harris 27

  28. APP: Appending via Preservation and Prevention Question: Who has been the president of US? Ensuring the probabilities of correct answers do not decrease while those of false answers do not increase Before editing During editing Adams Probability Adams Washington Washington Harris Ron Klain Ron Klain Finally Harris Maintain margin Control probability changes Appending 28

  29. Evaluation on PEAK-CF dataset Existing methods perform well in memorizing new knowledge They seriously disrupt the integrity of original correct knowledge and introduce unintentional false knowledge Edited models generally show worse performance under the Hard setting than those under the Random setting in terms of AFF and ANF APP significantly mitigates the neighboring perturbations of different methods on different LLMs 29

  30. Evaluation on PEAK-T dataset Comparing PEAK-CF with PEAK-T: PEAK-T is more challenging for existing methods to append knowledge PEAK-T suffers fewer neighboring perturbations during editing than PEAK-CF 30

  31. Probability of Answers Current editing methods: Existing editing methods severely perturb probabilities Correct answers decrease False answers increase APP significantly mitigated the probability perturbations, especially for ROME & MEMIT 31

  32. Ablation study Removing each editing objective L1, L2 or L3 in APP: Removing any editing objective of APP lead to performance degradation in terms of additivity and probability perturbations Removing L1 result in the most significant performance degradation 32

  33. Effect of Number of Neighboring Answers Extend the evaluation and consider utilizing k correct and false answers in APP, where k [0,1,3,5,all]: The performance of all editing methods coupled with APP is significantly improved, as the number of neighboring answers increasing Fewer answers still can helps a lot 33

  34. Challenges & Future Direction How to solve the reversal curse in QA form? How to enhance the scalability of model editing? How to accelerate model editing ? (ROME, 10000 instances 13h) Cross-lingual model editing Improve the robustness of model editing 34

  35. Side Effects of Knowledge Editing Question: model editing inherently improves the factuality of the model, but may come at the cost of a significant degradation of these general abilities. The side effects are analyzed by systematically evaluating four popular editing methods on three LLMs covering eight representative tasks Current editing methods unintentionally hurt the general abilities of LLMs no matter in instance- or batch-editing The difficulty in not being robust to weight perturbations lies in the dual objective of improving model factuality while simultaneously maintaining their general abilities 35 Model Editing Can Hurt General Abilities of Large Language Models, Jia-Chen Gu et.al. (Arxiv 2024)

  36. Q & A Github: https://github.com/mjy1111 Email: mjy1999@mail.ustc.edu.cn All slides will be put on http://home.ustc.edu.cn/~mjy1999/ 36

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#