Skip to main content
Cornell University
Learn about arXiv becoming an independent nonprofit.
We gratefully acknowledge support from
the Simons Foundation, Schmidt Sciences, Stockholm University, and all contributors.
Donate
arxiv logo > cs > arXiv:2511.06739

Help | Advanced Search

Computer Science > Machine Learning

(cs)
[Submitted on 10 Nov 2025]

Title:Rank-1 LoRAs Encode Interpretable Reasoning Signals

Authors:Jake Ward, Paul Riechers, Adam Shai
View a PDF of the paper titled Rank-1 LoRAs Encode Interpretable Reasoning Signals, by Jake Ward and 2 other authors
View PDF HTML (experimental)
Abstract:Reasoning models leverage inference-time compute to significantly enhance the performance of language models on difficult logical tasks, and have become a dominating paradigm in frontier LLMs. Despite their wide adoption, the mechanisms underpinning the enhanced performance of these reasoning models are not well understood. In this work, we show that the majority of new capabilities in reasoning models can be elicited by small, single-rank changes to base model parameters, with many of these changes being interpretable. Specifically, we use a rank-1 LoRA to create a minimal parameter adapter for Qwen-2.5-32B-Instruct which recovers 73-90% of reasoning-benchmark performance compared to a full parameter finetune. We find that the activations of this LoRA are as interpretable as MLP neurons, and fire for reasoning-specific behaviors. Finally, we train a sparse autoencoder on the entire activation state of this LoRA and identify fine-grained and monosemantic features. Our findings highlight that reasoning performance can arise largely from minimal changes to base model parameters, and explore what these changes affect. More broadly, our work shows that parameter-efficient training methods can be used as a targeted lens for uncovering fundamental insights about language model behavior and dynamics.
Comments: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Mechanistic Interpretability Workshop
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2511.06739 [cs.LG]
  (or arXiv:2511.06739v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2511.06739
arXiv-issued DOI via DataCite

Submission history

From: Jake Ward [view email]
[v1] Mon, 10 Nov 2025 06:00:25 UTC (4,215 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled Rank-1 LoRAs Encode Interpretable Reasoning Signals, by Jake Ward and 2 other authors
  • View PDF
  • HTML (experimental)
  • TeX Source
view license
Current browse context:
cs.LG
< prev   |   next >
new | recent | 2025-11
Change to browse by:
cs
cs.AI

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar
export BibTeX citation Loading...

Bookmark

BibSonomy logo Reddit logo

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status