Knowledge Editing for Large Language Models

Knowledge Editing for Large Language Models

Jun

-Y

u Ma

National Engineering Research Center of Speech and Language Information Processing,

University of Science and Technology of China, Hefei, China

/202

Presenter

•

2017.09 - 2021.06, Bachelor, USTC

•

2021.09 - Present, Ph.D. student, USTC

•

Main interests:

1.

Information

Extraction

2.

Multilinguality

3.

Model Editing

Jun-Yu Ma

Phd@ustc

Hallucinations in Large Language Models

LLMs inevitably exhibit hallucinations

•

eviate from user input

•

eviate from generated context

•

Error f

actual knowledge

significantly

undermines

 the reliability of LLMs

Ways to Mitigate Hallucinations

Supervised Fine-tuning:

•

Simple and effective

•

ifficult to obtain enough

high-

quality

 data

•

Easy

to

overfit

affect

other

knowledge

•

More

computing resources

Retrieval-augmentation:

•

Individualization

on private data

•

Low

training and inference

costs

•

ntroduce

retrieval noise

•

Short-term

change

LLM

    Fixer

LLM

Model Editing:

•

Precise

control

•

Low

computing

 costs

•

Difficult to

scalable

•

May

affect other abilities

active research!

Model Editing: Definition and Evaluation

Efficiently change model behavior without affecting other inputs

Given an

Edit

sample

Generalization

recall the fact under the

in-

scope

 paraphrase prompts

𝐼(𝓍

𝑒



E.g.,

𝓍

𝑖𝑛

Who currently holds the office of President of the United States?

Locality

remain unchanged for the

prompts out of the editing scope

𝑂(𝓍

𝑒



E.g.,

𝓍

𝑜𝑢𝑡

Who

is

the

president

of

 France

Current State of Model Editing

Preserve

parameters

by integrating an

auxiliary network

Modify

parameters

directly

responsible for undesirable output

Yunzhi Yao

, et al.

 Editing Large Language Models: Problems, Methods, and Opportunities.

EMNLP 2023.

Where is Knowledge Stored in LLMs?

Mor Geva

, et al.

 Editing Transformer Feed-Forward Layers Are Key-Value Memories.

EMNLP 2021.

Damai Dai

, et al.

 Editing Knowledge Neurons in Pretrained Transformers.

ACL 2022.

Kevin Meng, et al.

Locating and Editing Fact Associations in GPT

. NeurIPS 2022.

Untying the Reversal Curse via

Bidirectional Language Model Editing

The

Reversal Curse

•

What influences the likelihood of B?

•

Examples that match the order

(“A

precedes B”)

are far more

influential than examples with

reverse order

(“B precedes A”)

Lukas Berglund

, et al.

 The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A".

2023.

LLM

…

Previous

unidirectional

benchmarks and

approaches have failed to explore the

reversal curse



Single-hop

 factual question



Multi-hop

 factual question

Bidirectional LM Editing

Evaluation forms of

question answering

and

judgement

Reversibility

BAKE: Bidirectional Assessment for Knowledge Editing

Massive

factual

triples

subject, relation, object

based on Wikidata

Two sub-datasets: BAKE-Q&J and BAKE-J



BAKE-Q&J (25 one-to-one and one-to-many)



BAKE-J (20 many-to-one and many-to-many)

Handwritten templates for bidirectional relations

BAKE:

Example and Evaluation

Efficacy:

Generalization:

Locality:

Reversibility-

Judgement:

Reverse-QA

Score

 (RQS):

Reverse-Judgment

Score

 (RJS) :

Examples:

Reverse-QA

Magnitude

 (RQM) and

Reverse-Judgment

Magnitude

 (RJM)

measure the probability difference

Metrics:

Reversibility-QA:

Evaluation of Current Editing Methods

•

Perform well

in the

editing direction

•

Suffer serious

deficiencies

in the

reverse direction

•

Gradient-based

 (FT,

MEND)

perform worse

than editing-based (KN,

MEMIT, ROME) in the

reverse direction

BIRD:

Bidirectionally Inversible Relationship moDeling

   Static Word2Vec property:

Extend

to

the

factual

triples

Conceptualize

France

Paris

England

London

has capital

Dynamic l

anguage model:

•

Enhance

 the association of the

NEW

 fact bidirectionally

R(

subject

) + R(

forward

relation

)  is driven close to R(

object_new

R(

object_new

) + R(

backward

relation

) is driven close to R(

subject

London

England

Paris

has capital

England has the capital

London

Paris

is capital of

Enhance

BIRD:

Bidirectionally Inversible Relationship moDeling

•

Weaken

 the association of the

ORIGINAL

 fact bidirectionally

R(

subject

) + R(

forward

relation

)  is driven away from R(

object_orig

R(

object_orig

) + R(

backward

relation

) is driven away from R(

subject

New knowledge

Original knowledge

•

Finally

London

England

Paris

has capital

England has the capital

London

Paris

is capital of

Weaken

BIRD:

Bidirectionally Inversible Relationship moDeling

•

Significantly improve the editing performance of four LLMs in the

reverse direction

Results of BIRD

Comparison with Other Metrics

Portability

whether can transfer editing knowledge to related content

Multi-hop

whether can answer multi-hop questions entailed by the

editing facts

Reversibility is indeed more challenging!

Yunzhi Yao

, et al.

 Editing Large Language Models: Problems, Methods, and Opportunities.

EMNLP 2023.

Zexuan Zhong, et al.

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions.

 EMNLP 2023.

Log Probability of Desired Outputs

Current editing methods:

•

fail

 to increase the probability of

the desired answer after editing

•

fail

 to decrease

the probability of

the original answer

•

fail

to

decrease

the margin between

original answer &

 desired answer

The

closer

 to 0,

the

greater

the

probability

Effect of Objective Weights

•

α

Incorporating the

bidirectionally inversible relationships

is effective

hould

not be too large

to affect the memorization of new facts

•

β

Weakening

the association of the

original

fact is necessary

New knowledge

Original knowledge

Neighboring Perturbations of Knowledge Editing

on Large Language Models

Neighboring Perturbations of Knowledge Editing

Motivation:

1.

Prior works mainly focus on what should

change after editing (if target knowledge

has been memorized)

Question:

Whether

the editing operation of

appending a new answer

into an answer list to

a question

perturbs the neighboring knowledge

encapsulated within them

•

catastrophic

forgetting of original correct answers

•

unintentional

inclusion of incorrect answers

Additivity

Metric:

the

Additivity

 is introduced to assess the degree of perturbation to

neighboring knowledge

Given question           ,  true answer list                     ,  ,     ,  false answer list                                ,

a new answer to append

•

Relative ranking of objects

:  the minimum probability of

correct answers

should be

larger

 than the maximum probability of

false answers

before and after editing

Ranking Forgetting Factor:

Ranking Noising Factor:

Additivity

•

Absolute probability change of objects

: even

relative ranking

unchanged,

substantial harm is inflicted upon the model if the absolute probability changes

unexpectedly

     1. if prob of

correct answers

decrease

     2. if prob of

false answers

increase

•

Aggregation:

Additive Forgetting Factor:

Additive Noising Factor

PEAK: Perturbation Evaluation of Appending Knowledge

1.

Based on

Wikidata & YAGO

use

fact triples

 of forms (subject, relation, object)

2.

Two datasets:

•

PEAK

-CF (the new answer is counterfact)

•

PEAK-T (the new answer is

factually correct

and occurs after the original model)

3.

Sampling false answers

•

Hard

 (related to new appended answer)

•

Random

semantically distant

 to new answer

APP:

Appending via Preservation and Prevention

Question: Who has been the president of US?



Maintain

 a certain margin between the probabilities of

original correct answers

and

false answers

Washington

margin

Trump

Probability

Harris

Ron Klain

Adams

Blinken

Correct

False

APP:

Appending via Preservation and Prevention

Maintain margin

Control probability changes



nsuring

the probabilities of

correct answers

do not

decrease

 while those of

false answers

do not

increase



Finally

Washington

Probability

Harris

Ron Klain

Adams

Washington

Adams

During editing

Before editing

Harris

Ron Klain

Appending

Question: Who has been the president of US?

Evaluation on

PEAK-CF dataset

•

Existing methods perform well

in memorizing new knowledge

•

They seriously

disrupt the integrity of original correct knowledge

and

introduce

unintentional false knowledge

•

Edited models generally

show worse performance

under the

Hard

 setting than

those under the

Random

 setting in terms of AFF and ANF

•

APP

 significantly

mitigates the neighboring perturbations

of different methods

on different LLMs

Evaluation on

PEAK-T dataset

Comparing PEAK-CF with PEAK-T:

•

PEAK-T is more challenging for existing methods to

append knowledge

•

PEAK-T suffers

fewer neighboring perturbations

during editing than PEAK-CF

Probability of Answers

Current editing methods:

•

Existing editing methods severely

perturb probabilities

•

APP significantly

mitigated

 the probability perturbations, especially for ROME & MEMIT

Correct answers

decrease

False answers

increase

Ablation study

Removing each editing objective L1, L2 or L3 in APP:

•

Removing any editing objective of APP lead to performance degradation in

terms of additivity and probability perturbations

•

Removing L1 result in the most significant performance degradation

Effect of Number of Neighboring Answers

Extend the evaluation and consider utilizing k correct

and false answers in APP, where k∈[0,1,3,5,all]:

•

The performance of all editing methods coupled

with APP is significantly improved, as the number

of

neighboring answers

increasing

•

ewer answers still can helps a lot

•

How to solve the “reversal curse” in QA form?

•

How to enhance the scalability of model editing?

•

How to accelerate model ed

ting ? (ROME, 10000 instances

≈

13h

•

Cross-lingual model editing

•

Improve the robustness of model editing

Challenges &

Future

Direction

Side Effects of Knowledge Editing

Model Editing Can Hurt General Abilities of Large Language Models

, Jia-Chen Gu et.al.

(Arxiv 2024)



Question: model editing inherently improves the factuality

of the model, but may come at the cost of a significant

degradation of these general abilities.



Current editing methods unintentionally

hurt the general abilities of LLMs no

matter in instance- or batch-editing



The difficulty in

 not being robust to

weight perturbations

lies in the dual

objective of

improving model factuality

while simultaneously

maintaining their

general abilities



The side effects are analyzed by systematically evaluating

four popular editing methods on three LLMs covering

eight representative tasks

Q & A

Github: https://github.com/mjy1111

Email: mjy1999@mail.ustc.edu.cn

All slides will be put on

http://home.ustc.edu.cn/~mjy1999/

Slide Note

Embed Share

Download

Knowledge editing for large language models focuses on addressing hallucinations and errors in generated content by modifying model behavior without affecting other inputs. Techniques such as LLM fixer, model editing, and supervised fine-tuning aim to mitigate these issues. Recent research explores ways to locate and edit knowledge stored within LLMs for improved performance.

daniyal Follow

Uploaded on Apr 16, 2024 | 6 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Knowledge Editing for Large Language Models Jun-Yu Ma National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, China 3/16/2024 1

Presenter 2017.09 - 2021.06, Bachelor, USTC 2021.09 - Present, Ph.D. student, USTC Main interests: 1. Information Extraction Jun-Yu Ma Phd@ustc 2. Multilinguality 3. Model Editing

Hallucinations in Large Language Models LLMs inevitably exhibit hallucinations Deviate from user input Deviate from generated context Error factual knowledge significantly undermines the reliability of LLMs 3

Ways to Mitigate Hallucinations LLM Fixer LLM Retrieval-augmentation: Individualization on private data Low training and inference costs Introduce retrieval noise Short-termchange Model Editing: Precise control Low computing costs Difficult to scalable May affect other abilities Supervised Fine-tuning: Simple and effective Difficult to obtain enough high- quality data Easy to overfit & affect other knowledge Morecomputing resources active research! 4

Model Editing: Definition and Evaluation Efficiently change model behavior without affecting other inputs Given an Edit sample Generalization: recall the fact under the in-scope paraphrase prompts ?(??) E.g., ???- Who currently holds the office of President of the United States? Locality: remain unchanged for the prompts out of the editing scope ?(??) E.g., ????- Who is the president of France? 5

Current State of Model Editing Preserve parameters by integrating an auxiliary network Modify parameters directly responsible for undesirable output 6 Yunzhi Yao, et al. Editing Large Language Models: Problems, Methods, and Opportunities. EMNLP 2023.

Where is Knowledge Stored in LLMs? Mor Geva, et al. Editing Transformer Feed-Forward Layers Are Key-Value Memories. EMNLP 2021. Damai Dai, et al. Editing Knowledge Neurons in Pretrained Transformers. ACL 2022. 7 Kevin Meng, et al. Locating and Editing Fact Associations in GPT. NeurIPS 2022.

Untying the Reversal Curse via Bidirectional Language Model Editing 8

The Reversal Curse What influences the likelihood of B? Examples that match the order ( A precedes B ) are far more influential than examples with reverse order ( B precedes A ) B C LLM A 9 Lukas Berglund, et al. The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A". 2023.

Bidirectional LM Editing Previous unidirectional benchmarks and approaches have failed to explore the reversal curse Single-hop factual question Multi-hop factual question Can edited models recall the editing facts in the reverse direction? 10

Reversibility Evaluation forms of question answering and judgement Alternative outputs after inverting many-to-one and many-to-many relations e.g., Shakespeare has written? QA Judge One definite output after inverting one-to-one and one-to-many relations e.g., iPhone is developed by? QA Judge 11

BAKE: Bidirectional Assessment for Knowledge Editing Massive factual triples (subject, relation, object) based on Wikidata Two sub-datasets: BAKE-Q&J and BAKE-J BAKE-Q&J (25 one-to-one and one-to-many) BAKE-J (20 many-to-one and many-to-many) Handwritten templates for bidirectional relations 12

BAKE: Example and Evaluation Examples: Metrics: Reverse-QA Score (RQS): Efficacy: Generalization: Reverse-Judgment Score (RJS) : Locality: Reverse-QA Magnitude (RQM) and Reversibility-QA: Reverse-Judgment Magnitude (RJM) Reversibility- Judgement: measure the probability difference 13

Evaluation of Current Editing Methods Perform well in the editing direction Suffer serious deficiencies in the reverse direction Gradient-based (FT, MEND) perform worse than editing-based (KN, MEMIT, ROME) in the reverse direction 14

BIRD: Bidirectionally Inversible Relationship moDeling Static Word2Vec property: Dynamic language model: Conceptualize Paris London Extend to the factual triples has capital France England 15

BIRD: Bidirectionally Inversible Relationship moDeling Enhance the association of the NEW fact bidirectionally England has the capital London Paris R(subject) + R(forward relation) is driven close to R(object_new) Paris is capital of has capital Enhance England R(object_new) + R(backward relation) is driven close to R(subject) London 16

BIRD: Bidirectionally Inversible Relationship moDeling Weaken the association of the ORIGINAL fact bidirectionally England has the capital London Paris R(subject) + R(forward relation) is driven away from R(object_orig) Paris is capital of has capital Weaken R(object_orig) + R(backward relation) is driven away from R(subject) England London Finally Original knowledge New knowledge 17

Results of BIRD Significantly improve the editing performance of four LLMs in the reverse direction 18

Comparison with Other Metrics Portability: whether can transfer editing knowledge to related content Multi-hop: whether can answer multi-hop questions entailed by the editing facts Reversibility is indeed more challenging! Yunzhi Yao, et al. Editing Large Language Models: Problems, Methods, and Opportunities. EMNLP 2023. 19 Zexuan Zhong, et al. MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions. EMNLP 2023.

Log Probability of Desired Outputs The closer to 0, the greater the probability Current editing methods: fail to increase the probability of the desired answer after editing fail to decrease the probability of the original answer fail to decrease the margin between original answer & desired answer 20

Effect of Objective Weights Original knowledge New knowledge : Incorporating the bidirectionally inversible relationships is effective should not be too large to affect the memorization of new facts : Weakening the association of the original fact is necessary

Neighboring Perturbations of Knowledge Editing on Large Language Models 22

Neighboring Perturbations of Knowledge Editing Motivation: 1. Prior works mainly focus on what should change after editing (if target knowledge has been memorized) Question: Whether the editing operation of appending a new answer into an answer list to a question perturbs the neighboring knowledge encapsulated within them? catastrophic forgetting of original correct answers unintentional inclusion of incorrect answers 23

Additivity Metric: the Additivity is introduced to assess the degree of perturbation to neighboring knowledge Given question , true answer list , , , false answer list , a new answer to append Relative ranking of objects: the minimum probability of correct answers should be larger than the maximum probability of false answers before and after editing Ranking Forgetting Factor: Ranking Noising Factor: 24

Additivity Absolute probability change of objects: even relative ranking unchanged, substantial harm is inflicted upon the model if the absolute probability changes unexpectedly 1. if prob of correct answers decrease 2. if prob of false answers increase Aggregation: Additive Forgetting Factor: Additive Noising Factor: 25

PEAK: Perturbation Evaluation of Appending Knowledge 1. Based on Wikidata & YAGO, use fact triples of forms (subject, relation, object) 2. Two datasets: PEAK-CF (the new answer is counterfact) PEAK-T (the new answer is factually correct and occurs after the original model) 3. Sampling false answers Hard (related to new appended answer) Random (semantically distant to new answer) 26

APP: Appending via Preservation and Prevention Question: Who has been the president of US? Maintain a certain margin between the probabilities of original correct answers O and false answers Oh Correct False Probability Adams Washington Trump Blinken margin Ron Klain Harris 27

APP: Appending via Preservation and Prevention Question: Who has been the president of US? Ensuring the probabilities of correct answers do not decrease while those of false answers do not increase Before editing During editing Adams Probability Adams Washington Washington Harris Ron Klain Ron Klain Finally Harris Maintain margin Control probability changes Appending 28

Evaluation on PEAK-CF dataset Existing methods perform well in memorizing new knowledge They seriously disrupt the integrity of original correct knowledge and introduce unintentional false knowledge Edited models generally show worse performance under the Hard setting than those under the Random setting in terms of AFF and ANF APP significantly mitigates the neighboring perturbations of different methods on different LLMs 29

Evaluation on PEAK-T dataset Comparing PEAK-CF with PEAK-T: PEAK-T is more challenging for existing methods to append knowledge PEAK-T suffers fewer neighboring perturbations during editing than PEAK-CF 30

Probability of Answers Current editing methods: Existing editing methods severely perturb probabilities Correct answers decrease False answers increase APP significantly mitigated the probability perturbations, especially for ROME & MEMIT 31

Ablation study Removing each editing objective L1, L2 or L3 in APP: Removing any editing objective of APP lead to performance degradation in terms of additivity and probability perturbations Removing L1 result in the most significant performance degradation 32

Effect of Number of Neighboring Answers Extend the evaluation and consider utilizing k correct and false answers in APP, where k [0,1,3,5,all]: The performance of all editing methods coupled with APP is significantly improved, as the number of neighboring answers increasing Fewer answers still can helps a lot 33

Challenges & Future Direction How to solve the reversal curse in QA form? How to enhance the scalability of model editing? How to accelerate model editing ? (ROME, 10000 instances 13h) Cross-lingual model editing Improve the robustness of model editing 34

Side Effects of Knowledge Editing Question: model editing inherently improves the factuality of the model, but may come at the cost of a significant degradation of these general abilities. The side effects are analyzed by systematically evaluating four popular editing methods on three LLMs covering eight representative tasks Current editing methods unintentionally hurt the general abilities of LLMs no matter in instance- or batch-editing The difficulty in not being robust to weight perturbations lies in the dual objective of improving model factuality while simultaneously maintaining their general abilities 35 Model Editing Can Hurt General Abilities of Large Language Models, Jia-Chen Gu et.al. (Arxiv 2024)