Understanding Knowledge Editing for Large Language Models

Slide Note

Knowledge editing for large language models focuses on addressing hallucinations and errors in generated content by modifying model behavior without affecting other inputs. Techniques such as LLM fixer, model editing, and supervised fine-tuning aim to mitigate these issues. Recent research explores ways to locate and edit knowledge stored within LLMs for improved performance.

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

daniyal Follow

Uploaded on Apr 16, 2024 | 5 Views

Presentation Transcript

Knowledge Editing for Large Language Models Jun-Yu Ma National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, China 3/16/2024 1

Presenter 2017.09 - 2021.06, Bachelor, USTC 2021.09 - Present, Ph.D. student, USTC Main interests: 1. Information Extraction Jun-Yu Ma Phd@ustc 2. Multilinguality 3. Model Editing

Hallucinations in Large Language Models LLMs inevitably exhibit hallucinations Deviate from user input Deviate from generated context Error factual knowledge significantly undermines the reliability of LLMs 3

Ways to Mitigate Hallucinations LLM Fixer LLM Retrieval-augmentation: Individualization on private data Low training and inference costs Introduce retrieval noise Short-termchange Model Editing: Precise control Low computing costs Difficult to scalable May affect other abilities Supervised Fine-tuning: Simple and effective Difficult to obtain enough high- quality data Easy to overfit & affect other knowledge Morecomputing resources active research! 4

Model Editing: Definition and Evaluation Efficiently change model behavior without affecting other inputs Given an Edit sample Generalization: recall the fact under the in-scope paraphrase prompts ?(??) E.g., ???- Who currently holds the office of President of the United States? Locality: remain unchanged for the prompts out of the editing scope ?(??) E.g., ????- Who is the president of France? 5

Current State of Model Editing Preserve parameters by integrating an auxiliary network Modify parameters directly responsible for undesirable output 6 Yunzhi Yao, et al. Editing Large Language Models: Problems, Methods, and Opportunities. EMNLP 2023.

Where is Knowledge Stored in LLMs? Mor Geva, et al. Editing Transformer Feed-Forward Layers Are Key-Value Memories. EMNLP 2021. Damai Dai, et al. Editing Knowledge Neurons in Pretrained Transformers. ACL 2022. 7 Kevin Meng, et al. Locating and Editing Fact Associations in GPT. NeurIPS 2022.

Untying the Reversal Curse via Bidirectional Language Model Editing 8

The Reversal Curse What influences the likelihood of B? Examples that match the order ( A precedes B ) are far more influential than examples with reverse order ( B precedes A ) B C LLM A 9 Lukas Berglund, et al. The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A". 2023.

Bidirectional LM Editing Previous unidirectional benchmarks and approaches have failed to explore the reversal curse Single-hop factual question Multi-hop factual question Can edited models recall the editing facts in the reverse direction? 10

Reversibility Evaluation forms of question answering and judgement Alternative outputs after inverting many-to-one and many-to-many relations e.g., Shakespeare has written? QA Judge One definite output after inverting one-to-one and one-to-many relations e.g., iPhone is developed by? QA Judge 11

BAKE: Bidirectional Assessment for Knowledge Editing Massive factual triples (subject, relation, object) based on Wikidata Two sub-datasets: BAKE-Q&J and BAKE-J BAKE-Q&J (25 one-to-one and one-to-many) BAKE-J (20 many-to-one and many-to-many) Handwritten templates for bidirectional relations 12

BAKE: Example and Evaluation Examples: Metrics: Reverse-QA Score (RQS): Efficacy: Generalization: Reverse-Judgment Score (RJS) : Locality: Reverse-QA Magnitude (RQM) and Reversibility-QA: Reverse-Judgment Magnitude (RJM) Reversibility- Judgement: measure the probability difference 13

Evaluation of Current Editing Methods Perform well in the editing direction Suffer serious deficiencies in the reverse direction Gradient-based (FT, MEND) perform worse than editing-based (KN, MEMIT, ROME) in the reverse direction 14

BIRD: Bidirectionally Inversible Relationship moDeling Static Word2Vec property: Dynamic language model: Conceptualize Paris London Extend to the factual triples has capital France England 15

BIRD: Bidirectionally Inversible Relationship moDeling Enhance the association of the NEW fact bidirectionally England has the capital London Paris R(subject) + R(forward relation) is driven close to R(object_new) Paris is capital of has capital Enhance England R(object_new) + R(backward relation) is driven close to R(subject) London 16

BIRD: Bidirectionally Inversible Relationship moDeling Weaken the association of the ORIGINAL fact bidirectionally England has the capital London Paris R(subject) + R(forward relation) is driven away from R(object_orig) Paris is capital of has capital Weaken R(object_orig) + R(backward relation) is driven away from R(subject) England London Finally Original knowledge New knowledge 17

Results of BIRD Significantly improve the editing performance of four LLMs in the reverse direction 18

Comparison with Other Metrics Portability: whether can transfer editing knowledge to related content Multi-hop: whether can answer multi-hop questions entailed by the editing facts Reversibility is indeed more challenging! Yunzhi Yao, et al. Editing Large Language Models: Problems, Methods, and Opportunities. EMNLP 2023. 19 Zexuan Zhong, et al. MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions. EMNLP 2023.

Log Probability of Desired Outputs The closer to 0, the greater the probability Current editing methods: fail to increase the probability of the desired answer after editing fail to decrease the probability of the original answer fail to decrease the margin between original answer & desired answer 20

Effect of Objective Weights Original knowledge New knowledge : Incorporating the bidirectionally inversible relationships is effective should not be too large to affect the memorization of new facts : Weakening the association of the original fact is necessary

Neighboring Perturbations of Knowledge Editing on Large Language Models 22

Neighboring Perturbations of Knowledge Editing Motivation: 1. Prior works mainly focus on what should change after editing (if target knowledge has been memorized) Question: Whether the editing operation of appending a new answer into an answer list to a question perturbs the neighboring knowledge encapsulated within them? catastrophic forgetting of original correct answers unintentional inclusion of incorrect answers 23

Additivity Metric: the Additivity is introduced to assess the degree of perturbation to neighboring knowledge Given question , true answer list , , , false answer list , a new answer to append Relative ranking of objects: the minimum probability of correct answers should be larger than the maximum probability of false answers before and after editing Ranking Forgetting Factor: Ranking Noising Factor: 24

Additivity Absolute probability change of objects: even relative ranking unchanged, substantial harm is inflicted upon the model if the absolute probability changes unexpectedly 1. if prob of correct answers decrease 2. if prob of false answers increase Aggregation: Additive Forgetting Factor: Additive Noising Factor: 25

PEAK: Perturbation Evaluation of Appending Knowledge 1. Based on Wikidata & YAGO, use fact triples of forms (subject, relation, object) 2. Two datasets: PEAK-CF (the new answer is counterfact) PEAK-T (the new answer is factually correct and occurs after the original model) 3. Sampling false answers Hard (related to new appended answer) Random (semantically distant to new answer) 26

APP: Appending via Preservation and Prevention Question: Who has been the president of US? Maintain a certain margin between the probabilities of original correct answers O and false answers Oh Correct False Probability Adams Washington Trump Blinken margin Ron Klain Harris 27

APP: Appending via Preservation and Prevention Question: Who has been the president of US? Ensuring the probabilities of correct answers do not decrease while those of false answers do not increase Before editing During editing Adams Probability Adams Washington Washington Harris Ron Klain Ron Klain Finally Harris Maintain margin Control probability changes Appending 28

Evaluation on PEAK-CF dataset Existing methods perform well in memorizing new knowledge They seriously disrupt the integrity of original correct knowledge and introduce unintentional false knowledge Edited models generally show worse performance under the Hard setting than those under the Random setting in terms of AFF and ANF APP significantly mitigates the neighboring perturbations of different methods on different LLMs 29

Evaluation on PEAK-T dataset Comparing PEAK-CF with PEAK-T: PEAK-T is more challenging for existing methods to append knowledge PEAK-T suffers fewer neighboring perturbations during editing than PEAK-CF 30

Probability of Answers Current editing methods: Existing editing methods severely perturb probabilities Correct answers decrease False answers increase APP significantly mitigated the probability perturbations, especially for ROME & MEMIT 31

Ablation study Removing each editing objective L1, L2 or L3 in APP: Removing any editing objective of APP lead to performance degradation in terms of additivity and probability perturbations Removing L1 result in the most significant performance degradation 32

Effect of Number of Neighboring Answers Extend the evaluation and consider utilizing k correct and false answers in APP, where k [0,1,3,5,all]: The performance of all editing methods coupled with APP is significantly improved, as the number of neighboring answers increasing Fewer answers still can helps a lot 33

Challenges & Future Direction How to solve the reversal curse in QA form? How to enhance the scalability of model editing? How to accelerate model editing ? (ROME, 10000 instances 13h) Cross-lingual model editing Improve the robustness of model editing 34

Side Effects of Knowledge Editing Question: model editing inherently improves the factuality of the model, but may come at the cost of a significant degradation of these general abilities. The side effects are analyzed by systematically evaluating four popular editing methods on three LLMs covering eight representative tasks Current editing methods unintentionally hurt the general abilities of LLMs no matter in instance- or batch-editing The difficulty in not being robust to weight perturbations lies in the dual objective of improving model factuality while simultaneously maintaining their general abilities 35 Model Editing Can Hurt General Abilities of Large Language Models, Jia-Chen Gu et.al. (Arxiv 2024)