SPACE

Highlights

Empty box (train)

SPACE (Ours)

Vanilla policy

Heavy box (eval)

SPACE (Ours)

Vanilla policy

SPACE enables adapting to different execution dynamics such as object weights.

Training data
(DROID only)

DROID dataset: demonstrations across many labs and hardware units

SPACE (Ours)

Vanilla policy

SPACE improves learning from DROID data, which is collected across multiple robots.
(No additional data was collected in our lab.)

Training data
(UR5 only)

SPACE (Ours)

Vanilla policy

SPACE enables zero-shot embodiment transfer from UR5 to Franka.

Motivation

🔒 Discrepancies in actions across different robot setups

Control commands vary across embodiments, hardware units, and dynamics conditions

Replaying actions collected from one robot in the other robot results in different trajectories.

🤔 Why?: Actions are defined as underlying robot control command

Different robots require different actions to achieve the same motion

Different controller implementations across embodiments lead to different control commands for the same motion. Even under identical embodiment and controller, the robot motion varies by hardware wear and tear and manufacturing variability for the same control command.

❗ This significantly hurts learning & deployment across different robots.

Our framework: SPACE

🔑 Predict Cartesian state delta, an actual robot movement during data collection.

Cartesian state delta policy: VLA model outputs end-effector displacement

We train policy to predict Cartesian state delta, an end-effector displacement. Cartesian state delta is agnostic to robot dynamics varied by downstream controllers or hardware characteristics since it only describes the motion rather than the control input. It can be obtained from any robot that provides end-effector Cartesian pose.

🔑 Convert the predicted Cartesian state delta to robot-specific control commands

SPACE: VLA model with Action Adapter converting Cartesian state delta into robot-specific control commands

However, executing the predicted displacement on a target robot is non-trivial, since naievly commanding it will lead to different motion depending on the robot's controller and dynamics. To address this, Action Adapter learns a per-robot mapping from a Cartesian state delta to the control command \( u \) that realizes it using a linear model from calibration dataset \( \mathcal{D}_{\mathrm{cal}} \) (only 10 trajectories):

\( \min_{W_0, b_0} \sum_{(\Delta p, u) \in \mathcal{D}_{\mathrm{cal}}} \left\| W_0 \Delta p + b_0 - u \right\|_2^2. \)

Then, it is continuously updated online from policy rollout using LMS algorithm to adapt to varying dynamics during deployment.

\[ \begin{aligned} e_t &= W_t \Delta p_t + b_t - u_t \quad \text{(Action Adapter error)} \\[4pt] W_{t+1} &= W_t - \mu e_t (\Delta p_t)^\top, \qquad b_{t+1} = b_t - \mu e_t \quad \text{(Online update)} \end{aligned} \]

🚀 See Action Adapter on the job (adapting control commands to lift unseen heavy object)!

Experiments

We compare SPACE against policy predicting control commands, referred to as Control command, which is common practice in policy learning. The \( \pi_{0.5} \) model is used.

Q1. Does SPACE improve cross-embodiment learning?

Cross-embodiment experiment setup: training on UR5+FR3, UR5 only, and Human+FR3, executed on FR3

To test the effectiveness of SPACE, we experiment with co-training Franka Research 3 (FR3) data with UR5 data or UMI hand-held gripper data.

Co-training results: SPACE outperforms control-command policy across tasks

SPACE boosts co-training performance across UR5 and FR3 robots, and UMI human hand-held gripper and FR3 robot, while conventional policy of predicting control commands displays relatively low performances. This is because co-training using control command suffers from dynamics discrepancies between different embodiments. It even enables zero-shot execution of policy learned solely from UR5 data in FR3.

Q2. Does SPACE improve cross-hardware learning?

Even though the embodiment and controller implementation are the same, different hardware units exhibit discrepancies in dynamics due to wear and manufacturing variability.

Different hardware units of the same robot exhibit discrepancies in dynamics

Execution in different hardware from training

For example, when deploying policy in robot not used during data collection, control command policy suffers from degradation due to subtle dynamics difference, while SPACE stays robust.

DROID success rate

SPACE achieves the best performance when learning from DROID data collected from multiple hardware units of the FR3 robot. This is because control commands are all specific to the data collection robot’s dynamics, which are inconsistent across different hardware units. We use the subset that involves an object "marker" to reduce the compute.

Q3. Does SPACE remain robust under a dynamics shift from training time?

SPACE can also adapt to the changes in environment dynamics from training. We collect training data using an empty box and put heavy metals during inference. In this setup, control command policy is unable to lift the heavy box and success rate drops to 0%. Meanwhile, SPACE adapts control command using Action Adapter to lift the heavy box and achieves 92% success rate.

Empty box (train)

→

Heavy box (eval)

SPACE (Ours)

Control command

Object weight change during deployment

SPACE can execute the policy under different control Hz. For example, we accelerate policy execution speed by increasing control hz to 30Hz, from 15Hz used during data collection. Meanwhile, policy predicting control commands degrades at 30 Hz due to discrepancy from training time.

SPACE executes reliably at higher control Hz while control-command policy degrades

Control Hz change during deployment
(success rate & execution time)

Action Adapter gain adapts to the control Hz change

Gain change during deployment
(success rate)

SPACE can also execute under different controller gains from training time. Multiplying proportional gains (Kp) by 0.5x or 1.5x significantly drops success rates for control command policy while SPACE remains robust.

💡 Takeaway: By predicting dynamics-agnostic Cartesian state deltas for policy and adaptively converting them to robot-specific control commands, SPACE improves learning from different embodiments and hardware, and remains robust under varying dynamics at deployment.

BibTex

If you find our work useful, please cite the paper!


@article{lee2026space,
  title={SPACE: Enabling Learning from Cross-Robot Data Toward Generalist Policies},
  author={Lee, Haeone and Jeon, Byeongguk and Jeong, Suchae and Kim, Jian and Lee, Kimin},
  journal={arXiv preprint arXiv:2606.24049},
  year={2026},
}

SPACE: Enabling Learning from Cross-Robot Data Toward Generalist Policies

We propose SPACE (State Prediction and Adaptive Command Execution), a framework that adopts Cartesian state delta as a universal robot action and continuously adapts to different downstream dynamics during execution.

Highlights

SPACE enables adapting to different execution dynamics such as object weights.

SPACE improves learning from DROID data, which is collected across multiple robots.(No additional data was collected in our lab.)

SPACE enables zero-shot embodiment transfer from UR5 to Franka.

Motivation

🔒 Discrepancies in actions across different robot setups

🤔 Why?: Actions are defined as underlying robot control command

❗ This significantly hurts learning & deployment across different robots.

Our framework: SPACE

🔑 Predict Cartesian state delta, an actual robot movement during data collection.

🔑 Convert the predicted Cartesian state delta to robot-specific control commands

Experiments

Q1. Does SPACE improve cross-embodiment learning?

Q2. Does SPACE improve cross-hardware learning?

Execution in different hardware from training

DROID success rate

Q3. Does SPACE remain robust under a dynamics shift from training time?

Object weight change during deployment

Control Hz change during deployment(success rate & execution time)

Gain change during deployment(success rate)

BibTex

SPACE improves learning from DROID data, which is collected across multiple robots.
(No additional data was collected in our lab.)

Control Hz change during deployment
(success rate & execution time)

Gain change during deployment
(success rate)