Reasoning trajectories collapse into manifolds
Inference-time hidden states rapidly concentrate on compact, low-dimensional trajectories despite living inside high-dimensional representation spaces.
Reasoning in large language models is predominantly evaluated through labeled benchmarks, conflating task performance with the quality of internal inference. Here we study reasoning as an intrinsic dynamical process by examining the evolution of internal representations during inference.
We find that inference-time dynamics consistently self-organize into low-dimensional manifolds embedded within high-dimensional representation spaces. Such geometric compression is pervasive, but it is not sufficient for stable or reliable reasoning. Effective reasoning dynamics emerge within a constrained structural regime characterized by three conditions: adequate representational expressivity, spontaneous manifold compression, and preservation of non-degenerate information volume within the compressed subspace.
Based on these insights, we introduce a unified, label-free diagnostic computed solely from internal dynamics. The findings suggest that reasoning in LLMs is fundamentally governed by geometric and informational constraints, offering a complementary framework to benchmark-centric assessment.
Inference-time hidden states rapidly concentrate on compact, low-dimensional trajectories despite living inside high-dimensional representation spaces.
Models with similarly low intrinsic dimensionality can exhibit very different reasoning behavior, so healthy reasoning requires additional structural constraints.
The proposed score combines world expressivity, stimulus-induced dimensionality, and information volume without using task labels or reference answers.
The paper reframes reasoning as a dynamical process unfolding in representation space during generation. Across model families, scales, and prompts, internal trajectories become low-dimensional during inference while the underlying vocabulary embedding space remains highly expressive.
Robust reasoning appears when three constraints are jointly satisfied: the model retains broad representational expressivity, inference dynamics organize into compact manifolds, and those manifolds preserve non-degenerate information volume.
Label-free reasoning health diagnostic
Dworld measures representational expressivity, Dstim measures stimulus-induced manifold dimensionality, and V measures information volume preserved during inference.
Layer-wise intrinsic dimensionality reveals compact reasoning trajectories across representative LLM families.
Inference trajectories concentrate on compact manifolds while static vocabulary embeddings retain high dimensionality.
Benchmark correlations show that compression by itself cannot explain reasoning quality.
Effective reasoning balances geometric constraint with non-degenerate structured variation.
The label-free diagnostic integrates expressivity, manifold compression, and information volume.
Our experiments are built on the perceptual-manifold-geometry Python package, which provides geometric analysis tools for high-dimensional data manifolds including intrinsic dimension, curvature, density, and topological structure.
@misc{ma2026reasoning,
title={Reasoning emerges from constrained inference manifolds in large language models},
author={Yanbiao Ma and Fei Luo and Linfeng Zhang and Chuangxin Zhao and Mingxuan Wang and Yinan Wu and Zhe Qian and Yang Lu and Long Chen and Zhao Cao and Xiaoshuai Hao and Ji-Rong Wen and Jungong Han},
year={2026},
eprint={2605.08142},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.08142}
}