Context-Robust Knowledge Editing for Language Models
Knowledge editing (KE) methods offer an efficient way to modify knowledge in large language models. Current KE evaluations typically assess editing success by considering only the edited knowledge without any preceding contexts. In real-world applications, however, preceding contexts often trigger the recall of the original knowledge and undermine the intended edit. To address this issue, we have developed CHED—a benchmark designed to evaluate the context robustness of KE methods. Evaluations on CHED show that they often fail when preceding contexts are present. To mitigate this shortcoming, we introduce CoRE, a KE method designed to strengthen context robustness by minimizing context-sensitive variance in hidden states of the model for edited knowledge. This method not only improves the editing success rate in situations where a preceding context is present but also preserves the overall capabilities of the model. We also provide an in-depth analysis of the differing impacts of preceding contexts when introduced as user utterances versus assistant responses, and we dissect attention-score patterns to assess how specific tokens influence editing success.
Large language models (LLMs) exhibit emerging intelligence by absorbing extensive knowledge during pretraining. However, some of this knowledge may become outdated or require correction [1, 2]. To address this, knowledge editing focuses on modifying a subset of model parameters to ensure the model generates the edited knowledge [3, 4]. Yet, models edited by many existing methods often fail to recall the edited knowledge in the presence of preceding context during text generation (Figure 1). In particular, words in the preceding context that are semantically related to the original knowledge tend to receive disproportionately high attention scores, thereby disrupting the recall of the edited knowledge. Additionally, there is currently no benchmark for evaluating the robustness of knowledge editing methods to such contextual interference.
To address these limitations, we introduce CHED (Contextual Hop Editing Dataset), a new benchmark for evaluating the contextual robustness of knowledge editing methods. In CHED, edited knowledge is accompanied by several context texts containing words related to the original knowledge. Such contexts may interfere with the model’s ability to recall the edited knowledge, causing it to revert to the original knowledge. We further propose a new knowledge editing method, CoRE (Context-Robust Editing), which improves contextual robustness by prepending prefix contexts during editing and minimizing the variance of the model’s hidden states generated across these contexts (Figure 2). This simple regularization ensures that only the necessary parameter modifications are applied while preventing overfitting to any single context. According to our evaluation, CHED effectively reveals the vulnerability of many knowledge editing methods to preceding context. Nevertheless, our CoRE method demonstrates strong contextual robustness. We also find that models are more easily distracted when the prefix context is provided as a user utterance (as in a chat setting), rather than as the model’s own utterance. Further analysis of the model’s attention mechanism reveals that CoRE helps the model reduce attention to distractive words in the preceding context and increase attention to words that facilitate recall of the edited knowledge. This work will be published at ACL Findings 2025.
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2024. A survey of large language models. Preprint, arXiv:2303.18223.
Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, and Ningyu Zhang. 2023. Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10222–10240, Singapore. Association for Computational Linguistics.
Zihan Zhang, Meng Fang, Ling Chen, Mohammad-Reza Namazi-Rad, and Jun Wang. 2023. How do large language models capture the ever-changing world knowledge? A review of recent advances. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8289–8311, Singapore. Association for Computational Linguistics.