A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models

Lixin Xiu, Xufang Luo, Hideki Nakayama

3/31/2026

cs.LGcs.CLcs.CV

Abstract

Large vision-language models (LVLMs) achieve impressive performance, yet their internal decision-making processes remain opaque, making it difficult to determine if the success stems from true multimodal fusion or from reliance on unimodal priors. To address this attribution gap, we introduce a novel framework using partial information decomposition (PID) to quantitatively measure the "information spectrum" of LVLMs -- decomposing a model's decision-relevant information into redundant, unique, and synergistic components. By adapting a scalable estimator to modern LVLM outputs, our model-agnostic pipeline profiles 26 LVLMs on four datasets across three dimensions -- breadth (cross-model & cross-task), depth (layer-wise information dynamics), and time (learning dynamics across training). Our analysis reveals two key results: (i) two task regimes (synergy-driven vs. knowledge-driven) and (ii) two stable, contrasting family-level strategies (fusion-centric vs. language-centric). We also uncover a consistent three-phase pattern in layer-wise processing and identify visual instruction tuning as the key stage where fusion is learned. Together, these contributions provide a quantitative lens beyond accuracy-only evaluation and offer insights for analyzing and designing the next generation of LVLMs. Code and data are available at https://github.com/RiiShin/pid-lvlm-analysis .

View on arXiv View PDF

Code Implementations(6)

RiiShin/pid-lvlm-analysisOfficial100%

This is the official repository for the paper: A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models (ICLR 2026).

00Feb 26, 20261 months agoMIT

microsoft/LoRA64%

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

13,195881Jun 18, 20211 years agoMIT

adaptationdebertadeep-learninggpt-2gpt-3+5 more

apple/ml-fastvlm64%

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

7,164537May 1, 202511 months agoNOASSERTION

dzhng/deep-research61%

An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the simplest implementation of a deep research agent - e.g. an agent that can refine its research direction overtime and deep dive into a topic.

18,3671,895Feb 4, 20257 months agoMIT

agentaigpto3-miniresearch

liyifinhub-github/PCA52%

Principal component analysis is used to reduce the dimensionality of a large dataset while preserving most of its information. It is recommended to standardize the raw data first to reduce bias. PCA can be performed using the covariance matrix of the dataset, the correlation matrix, or the singular value decomposition of the (centered and possibly scaled) data matrix. The examples we present here mainly focus on covariance matrices. Therefore, the three steps involved in PCA are: first, normalize the dataset; second, compute the covariance matrix; third, obtain the eigenvectors (of the covariance matrix), eigenvalues (of the covariance matrix), and corresponding principal components. Note that an m-dimensional dataset corresponds to m eigenvectors, m*m symmetric covariance matrix, and m principal components.

00Aug 7, 20223 years agoBSD-3-Clause

Ojusvt/Palatal-rugae-as-a-Parameter-in-Identification-of-Human-Being-using-Image-Processing49%

In this world of simulation, the identity of anyindividual is always questionable in situations ofmass massacres and disasters. A lot ofliterature is available on forensic odontology tools, but still this branch of odontology is in its infancystage in India.Establishing a person‟s identity can be a very difficult process in forensic identification. Dental, fingerprints, and DNA comparisons are the most common techniques used in this context allowing fast andsecure identification. However, since they cannot be always used, sometimes simple techniques can beused successfully in human identification, such as „Palatal rugoscopy„, which is the study of palatal rugae.Palatal rugae have been equated with fingerprints and are unique to an individual.[5] It can be of special interest in edentulous cases and also in certain conditions where there are no fingers to be studied, such as burned bodies or bodies that underwent severe decomposition.Their resistance to trauma and their apparent unique appearance has suggested their use as atool for forensic identification. A large amount of data can be stored and quick retrieval of information will be possible which may assist in immediate and effective identification of an individual. It would be beneficial to conduct further studies in large samples and taking more parameters for palatal rugae analysis in all races of the world, so that a national data can be prepared. The palatal rugae pattern can act as a fingerprint inidentification of a person. The analysis of palatal rugae combined with other methods is an important alternative and complementary technique for human identification.[4] The shape of rugae was compared between the two study groups and was found to be highly significant between western Indian and northern Indian subjects. The numberand shape of rugae differed significantly between the genders, with males having a highly significant difference as compared to the females. The western Indian group showed wavy shape predominantly in males and females had straight rugae. Similarly, the northern Indian male participants also had wavy shape; however, females in this group had more curved shaped rugae. [2]

00Jan 8, 20215 years ago

A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models

Abstract

Code Implementations(6)

Discussion