Hallucination-aware intermediate representation edit in large vision-language models

Wei Suo, Hanzu Zhang, Lijun Zhang, Ji Ma, Peng Wang +1 more

3/31/2026

cs.CVcs.AI

Abstract

Large Vision-Language Models have demonstrated exceptional performance in multimodal reasoning and complex scene understanding. However, these models still face significant hallucination issues, where outputs contradict visual facts. Recent research on hallucination mitigation has focused on retraining methods and Contrastive Decoding (CD) methods. While both methods perform well, retraining methods require substantial training resources, and CD methods introduce dual inference overhead. These factors hinder their practical applicability. To address the above issue, we propose a framework for dynamically detecting hallucination representations and performing hallucination-eliminating edits on these representations. With minimal additional computational cost, we achieve state-of-the-art performance on existing benchmarks. Extensive experiments demonstrate the effectiveness of our approach, highlighting its efficient and robust hallucination elimination capability and its powerful controllability over hallucinations. Code is available at https://github.com/ASGO-MM/HIRE

View on arXiv View PDF

Code Implementations(8)

ASGO-MM/HIREOfficial100%

40CSS, HTMLFeb 27, 20263 weeks ago

ictnlp/TruthX67%

Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space"

1447Feb 27, 20242 years agoGPL-3.0

baichuanchatglmchatgptexplainable-aigpt-4+13 more

MaxZhadobin/visual-scene-language61%

A proposed open specification for Visual Scene Language (VSL) — a semantic representation format for describing, editing, and generating visual scenes using large language models (LLMs). Created by Maxim Zhadobin.

10May 28, 202510 months agoNOASSERTION

Lingkai-Kong/RE-Control55%

Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective

345Jun 30, 20241 years ago

SYEDTOUSEEFALI123/GENAI54%

GenAI with Intel® OpenVINO™This repository provides resources and examples for running Generative AI (GenAI) and performing large language model (LLM) inference on Intel AI laptops using Intel® OpenVINO™. It includes instructions for converting models to Intermediate Representation (IR) format, running efficient inference on CPUs, and llm model.

11Jul 15, 20241 years ago

hjiang13/LLMs-in-IR53%

LLMs-in-IR: Evaluating Large Language Models in Intermediate Representation Code Comprehension

00Jul 8, 20241 years ago

hjiang13/IR2Tab53%

IR2Tab is a tool for converting Intermediate Representation (IR) code into a tabular format, enabling seamless integration with Large Language Models (LLMs) for tasks like code analysis, optimization, and summarization. The project provides scripts for conversion and leveraging LLMs in downstream tasks.

01Oct 17, 20241 years ago

BhaskarThakur1997/Bunny-Prisoner-Locating51%

Bunny Prisoner Locating ======================= Keeping track of Commander Lambda's many bunny prisoners is starting to get tricky. You've been tasked with writing a program to match bunny prisoner IDs to cell locations. The LAMBCHOP doomsday device takes up much of the interior of Commander Lambda's space station, and as a result the prison blocks have an unusual layout. They are stacked in a triangular shape, and the bunny prisoners are given numerical IDs starting from the corner, as follows: | 7 | 4 8 | 2 5 9 | 1 3 6 10 Each cell can be represented as points (x, y), with x being the distance from the vertical wall, and y being the height from the ground. For example, the bunny prisoner at (1, 1) has ID 1, the bunny prisoner at (3, 2) has ID 9, and the bunny prisoner at (2,3) has ID 8. This pattern of numbering continues indefinitely (Commander Lambda has been taking a LOT of prisoners). Write a function solution(x, y) which returns the prisoner ID of the bunny at location (x, y). Each value of x and y will be at least 1 and no greater than 100,000. Since the prisoner ID can be very large, return your solution as a string representation of the number. Languages ========= To provide a Java solution, edit Solution.java To provide a Python solution, edit solution.py Test cases ========== Your code should pass the following test cases. Note that it may also be run against hidden test cases not shown here. -- Java cases -- Input: Solution.solution(3, 2) Output: 9 Input: Solution.solution(5, 10) Output: 96 -- Python cases -- Input: solution.solution(5, 10) Output: 96 Input: solution.solution(3, 2) Output: 9

01May 28, 20205 years ago

Hallucination-aware intermediate representation edit in large vision-language models

Abstract

Code Implementations(8)

Discussion