From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies
Abstract
Diffusion policies (DPs) achieve state-of-the-art performance on complex manipulation tasks by learning from large-scale demonstration datasets, often spanning multiple embodiments and environments. However, they cannot guarantee safe behavior, so external safety mechanisms are needed. These, however, alter actions in ways unseen during training, causing unpredictable behavior and performance degradation. To address these problems, we propose path-consistent safety filtering (PACS) for DPs. Our approach performs path-consistent braking on a trajectory computed from the sequence of generated actions. In this way, we keep execution consistent with the policy's training distribution, maintaining the learned, task-completing behavior. To enable a real-time deployment and handle uncertainties, we verify safety using set-based reachability analysis. Our experimental evaluation in simulation and on three challenging real-world human-robot interaction tasks shows that PACS (a) provides formal safety guarantees in dynamic environments, (b) preserves task success rates, and (c) outperforms reactive safety approaches, such as control barrier functions, by up to 68% in terms of task success.
Motivation
Recent advances in imitation learning with diffusion policies, including vision-language-action models (VLAs), have enabled robots to solve increasingly complex, long-horizon tasks. However, deploying DPs in dynamic environments with moving objects requires safeguarding mechanisms, as the intended policy actions may be unsafe. Reactive strategies, such as control barrier functions, often drive the agent into out-of-distribution (OOD) states not seen during training, leading to unpredictable behavior. We propose that safety mechanisms for DPs should remain consistent with the robot’s intended path to avoid out-of-distribution states and preserve high task success rates.
Method
The policy, conditioned on visual observations and proprioceptive inputs, generates action chunks that are transformed into a sequence of desired waypoints. From these waypoints, we compute a kinematically and dynamically feasible intended trajectory. PACS continuously monitors this trajectory and applies high-frequency safety filtering using reachability analysis to enforce task-specific safety constraints (e.g., collision avoidance or impact force limits).
Results
End effector paths for the SORTING task. The color intensity of the trajectories indicates the velocity, and the intensity of the shaded grey areas visualizes the training distribution. Our safety filter slows down the policy without leaving the desired path when the human is nearby. In contrast, the control barrier function (CBF) pushes the robot away from unsafe states, which often leads to out-of-distribution (OOD) states from which the policy cannot recover.
Highlights
- is a general framework for safe deployment of action chunking-based imitation learning policies, such as DPs and VLAs,
- performs path-consistent safety interventions to avoid OOD situations and maintain high task performance,
- computes an intended trajectory from the action chunk and monitors it using reachability analysis,
- can guarantee safety in real time and achieve high task success even on challenging real-world HRI tasks, whereas reactive safety filtering with CBFs frequently leads the policy into unrecoverable states.
BibTeX
@article{pacs2025,
title={From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies},
author={Ralf R{\"o}mer and Julian Balletshofer and Jakob Thumm and Marco Pavone and Angela P. Schoellig and Matthias Althoff},
journal={arXiv preprint arXiv:2511.06385},
year={2025}
}