CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

Learning Systems and Robotics Lab, Technical University of Munich
*Equal contribution
TUM
Motivation

Abstract

To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for non-exemplar continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods.

Motivation

Deploying robots in dynamic real-world environments, such as homes or hospitals, requires them to continuously acquire new skills without losing previously learned capabilities. Standard fine-tuning of VLAs often leads to catastrophic forgetting, where new training overwrites critical prior knowledge. Moreover, reliance on storing past data for replay is often impractical due to privacy or storage constraints, and robots operating autonomously rarely have access to "oracle" task identifiers to tell them which skill to use. We address these challenges with a framework designed for lifelong learning that relies neither on stored exemplars nor on external task labels.

Method

Robomimic results
Routing and expansion mechanisms.
Real-world results
DiT architecture.

We introduce CLARE (Continual Learning via Adapter Routing and Expansion), a framework that injects lightweight, trainable adapters into the feedforward layers of a frozen, pre-trained VLA. CLARE utilizes a dynamic expansion strategy that monitors feature statistics; it only adds new parameters when the incoming task data is sufficiently novel, preventing unnecessary capacity increase. During deployment, an autonomous routing mechanism uses autoencoder-based discriminators to analyze input features and dynamically activate the most relevant adapter for the current situation. This allows the robot to seamlessly switch between skills without needing explicit task commands.

Results

Sorting Paths
Success rate curves of CLARE and five baselines on the LIBERO-Long benchmark.

We evaluated CLARE on the LIBERO benchmark using a sequence of 10 long-horizon manipulation tasks that require both language understanding and precise motor control. Our experiments demonstrate that CLARE achieves high success rates on new tasks (FWT: Forward Transfer) while effectively preventing the degradation of performance on previously learned tasks (NBT: Negative Backward Transfer). It significantly outperforms baselines like Sequential Fine-Tuning and LoRA, and even exceeds the performance of two methods that rely on experience replay, all while increasing the total parameter count by only approximately 2% per task.

Robomimic results
Baseline comparison. CLARE achieves the highest overall performance, as measured by AUC, and demonstrates strong capabilities to acquire new skills without forgetting.
Real-world results
Ablation study for the dynamic expansion threshold γ. Increasing γ significantly reduces the number of adapters added to the model but slightly reduces the capability to learn new tasks, as shown by the small decrease in AUC and FWT. In contrast, NBT remains at around zero, indicating that the model does not exhibit catastrophic forgetting.

Highlights

Key features of CLARE include:
  • Non-Exemplar Learning: Enables VLAs to learn sequentially without storing or replaying past data, respecting privacy and storage constraints.
  • Autonomous Adapter Routing: Automatically selects the correct task-specific module during inference based on feature similarity, eliminating the need for task IDs.
  • Dynamic Expansion: Efficiently adds new parameters only when necessary, maintaining a sub-linear growth in model size.
  • Prevents Catastrophic Forgetting: Preserves pre-trained representations while learning new skills.
  • Superior Performance: Outperforms state-of-the-art continual learning baselines on the challenging LIBERO benchmark.

BibTeX

@article{clare2025,
          title={CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion},
          author={Ralf R{\"o}mer and Yi Zhang and Angela P. Schoellig},
          journal={arXiv preprint arXiv:2601.09512},
          year={2026}
        }