From Demonstrations to Safe Deployment: Path-Consistent SafetyFiltering for Diffusion Policies

Römer, Ralf; Balletshofer, Julian; Thumm, Jakob; Pavone, Marco; Schoellig, Angela P.; Althoff, Matthias

From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies

Ralf Römer^*,1, Julian Balletshofer^*,1, Jakob Thumm², Marco Pavone², Angela P. Schoellig¹, Matthias Althoff¹

¹ Department of Computer Engineering, Technical University of Munich ² Department of Aeronautics and Astronautics, Stanford University
^*Equal Contribution

Paper arXiv Code (coming soon)

Abstract

Diffusion policies (DPs) achieve state-of-the-art performance on complex manipulation tasks by learning from large-scale demonstration datasets, often spanning multiple embodiments and environments. However, they cannot guarantee safe behavior, so external safety mechanisms are needed. These, however, alter actions in ways unseen during training, causing unpredictable behavior and performance degradation. To address these problems, we propose path-consistent safety filtering (PACS) for DPs. Our approach performs path-consistent braking on a trajectory computed from the sequence of generated actions. In this way, we keep execution consistent with the policy's training distribution, maintaining the learned, task-completing behavior. To enable a real-time deployment and handle uncertainties, we verify safety using set-based reachability analysis. Our experimental evaluation in simulation and on three challenging real-world human-robot interaction tasks shows that PACS (a) provides formal safety guarantees in dynamic environments, (b) preserves task success rates, and (c) outperforms reactive safety approaches, such as control barrier functions, by up to 68% in terms of task success.

Motivation

Recent advances in imitation learning with diffusion policies, including vision-language-action models (VLAs), have enabled robots to solve increasingly complex, long-horizon tasks. However, deploying DPs in dynamic environments with moving objects requires safeguarding mechanisms, as the intended policy actions may be unsafe. Reactive strategies, such as control barrier functions, often drive the agent into out-of-distribution (OOD) states not seen during training, leading to unpredictable behavior. We propose that safety mechanisms for DPs should remain consistent with the robot’s intended path to avoid out-of-distribution states and preserve high task success rates.

Method

The policy, conditioned on visual observations and proprioceptive inputs, generates action chunks that are transformed into a sequence of desired waypoints. From these waypoints, we compute a kinematically and dynamically feasible intended trajectory. PACS continuously monitors this trajectory and applies high-frequency safety filtering using reachability analysis to enforce task-specific safety constraints (e.g., collision avoidance or impact force limits).

Examples

The videos below show example rollouts of our real-world tasks for the baseline policy, a CBF-based shield, and our proposed PACS method.

Sorting

Baseline

CBF

PACS

Play All

Handover

Baseline

PACS

Play All

Feeding

Baseline

PACS

Play All

Speed Test

Baseline

PACS

Play All

Results

End effector paths for the SORTING task. The color intensity of the trajectories indicates the velocity, and the intensity of the shaded grey areas visualizes the training distribution. Our safety filter slows down the policy without leaving the desired path when the human is nearby. In contrast, the control barrier function (CBF) pushes the robot away from unsafe states, which often leads to out-of-distribution (OOD) states from which the policy cannot recover.

Robomimic results — Robomimic task success rate results on 100 rollouts. PACS achieves higher success rates than other safety filters.

Real-world results — Impact of safeguarding in our real-world experiments. PACS guarantees safe deployment and maintains high task success, resulting in a high safe success rate.

Speed comparison — Baseline comparison and speed analysis for the Sorting task. PACS achieves higher task performance than CBFs and can even perform the task faster than the nominal policy.

Highlights

PACS

is a general framework for safe deployment of action chunking-based imitation learning policies, such as DPs and VLAs,
performs path-consistent safety interventions to avoid OOD situations and maintain high task performance,
computes an intended trajectory from the action chunk and monitors it using reachability analysis,
can guarantee safety in real time and achieve high task success even on challenging real-world HRI tasks, whereas reactive safety filtering with CBFs frequently leads the policy into unrecoverable states.

BibTeX

@article{pacs2025,
          title={From Demonstrations to Safe Deployment: Path-Consistent Safety Filtering for Diffusion Policies},
          author={Ralf R{\"o}mer and Julian Balletshofer and Jakob Thumm and Marco Pavone and Angela P. Schoellig and Matthias Althoff},
          journal={arXiv preprint arXiv:2511.06385},
          year={2025}
        }