NeuroFlow

Abstract

Visual encoding and decoding models act as gateways to understanding the neural mechanisms underlying human visual perception. Typically, visual encoding models that predict brain activity from stimuli and decoding models that reproduce stimuli from brain activity are treated as distinct tasks, requiring separate models and training procedures. This separation is inefficient and fails to model the consistency between encoding and decoding processes. To address this limitation, we propose NeuroFlow, the first unified framework that jointly models visual encoding and decoding from neural activity within a single flow model. NeuroFlow introduces two key components: (i) NeuroVAE is designed as a variational backbone to model neural variability and establish a compact, semantically structured latent space for bidirectional modeling across visual and neural modalities. (ii) Cross-modal Flow Matching (XFM) bypasses the typical paradigm of noise-to-data diffusion guided by a specific modality condition, instead learning a reversibly consistent flow model between visual and neural latent distributions. For the first time, visual encoding and decoding are reformulated as a time-dependent, reversible process within a shared latent space for unified modeling. Empirical results demonstrate that NeuroFlow achieves superior overall performance in visual encoding and decoding tasks with higher computational efficiency compared to any isolated methods. We further analyze principal factors that steer the model toward encoding–decoding consistency and, through brain functional analyses, demonstrate that NeuroFlow captures consistent activation patterns underlying neural variability. NeuroFlow marks a major step toward unified visual encoding and decoding from neural activity, providing mechanistic insights that inform future bidirectional visual brain–computer interfaces. Code will be released to facilitate future research.

Motivation

Motivation Illustration: A central challenge in visual encoding or decoding lies in cross-modal alignment that aims to establish a precise mapping between neural and visual distributions. Early methods relied on simple linear regressions to approximate the unidirectional relationship, which limited their ability to capture complex semantic correspondences. Recent approaches introduced nonlinear mappings using Diffusion Transformer (DiT) or Diffusion Prior (DP) under generative objectives, operating by conditioning Gaussian noise on one modality (e.g., neural or visual latent distribution) and iteratively guiding it toward the target distribution. However, such conditional noise-to-data pipelines still treat encoding and decoding as separate processes. In contrast, our proposed XFM establishes continuous and reversible flows directly between the neural and visual distributions, achieving a unified framework for encoding and decoding.

Method

Overview of NeuroFlow. Stage-1 (A): NeuroVAE introduces probabilistic learning to model neural variability and constrains the latent space with visual semantics for consistent image-to-fMRI synthesis. Stage-2 (B): XFM unifies encoding and decoding processes by learning a time-dependent, reversible flow between empirical visual and neural latent distributions. Stage-3 (C): Encoding and decoding are performed within a single model at inference, where reversing the temporal direction naturally transitions between the two processes.

Results

Qualitative visual encoding and decoding performance comparisons. Left: NeuroFlow achieves superior decoding quality in semantic fidelity and visual structure. Right: NeuroFlow suppresses irrelevant cortical activity while enhancing category-specific regions, capturing consistent activation patterns underlying neural variability to support image synthesis consistent with visual stimuli.

Analysis

Empirical visualizations. (A) Ablation study: removing key objectives leads to degraded visual fidelity and semantic coherence. (B) Flow trajectory: Encoding trajectory reveals a suppression of early visual responses and a transition toward functional regions (i.e., FFA and EBA). Decoding trajectory evolves from an initial structural sketch, not Gaussian noises, to a realistic and high-fidelity image. (C-D) Brain functional analysis: category-selective fMRI activations and voxel-wise evaluation derived from raw and synthetic fMRI, computed on the whole test set, showing that NeuroFlow suppresses early visual activity and emphasizes higher-order functional regions.

Visualizations

Citation

@article{mai2026neuroflow,
        title={NeuroFlow: Toward Unified Visual Encoding and Decoding from Neural Activity},
        author={Mai, Weijian and Nan, Mu and Zhu, Yu and Cao, Jiahang and Zhang, Rui and Dai, Yuqin and Song, Chunfeng and Luo, Andrew F and Wu, Jiamin},
        journal={arXiv preprint arXiv:2604.09817},
        year={2026}
      }

NeuroFlow Toward Unified Visual Encoding and Decoding from Neural Activity