"A person takes a few steps forward, jumps forward with both feet , and immediately turns right upon landing"

"A person is throwing something."
"A person walks forcefully forward 4 steps."
"A person is walking fast."
"The person is walking slowly."
"A man moves erratically, like a marionette struggling to free itself from its strings."
We showcase Motion-R1's capability to generate diverse and high-quality motions for out-of-distribution prompts.
"After hearing a loud noise, a person turned around , stepped back cautiously with hands raised defensively and then slowly approached."
"A person takes a few steps forward, jumps forward with both feet , and immediately turns right upon landing"
"A person raises arms, arches back slightly, then shifts weight onto the right leg while extending the left leg backward in a poised arabesque position."
"A person jumped up happily, raised hand and spsun excitedly."
"A person is serving in badminton ."
"A person is skipping rope."
"A person is dancing ballroom dance."
"The person walks as if balancing on a tightrope."
"The person mimics swimming in mid-air, as if performing a freestyle stroke without water."
"The person walks through strong wind, leans forward and braces against resistance."
(a) Traditional end-to-end models exhibit poor generalization on out-of-distribution motions. (b) Our Decomposed CoT Data Engine enables strong generalization by structuring high-level instructions into intermediate reasoning steps. (c) Existing RL-based methods rely on expensive human annotations to train preference models for reward signals. (d) Our RL Binding mechanism achieves efficient multi-modal alignment without additional annotation cost.

We compare Motion-R1 against baselines such as MoMask and MotionLLM. As shown in Left of Figure, Motion-R1 produces smooth, well-structured sequences for simple and multi-step instructions. To evaluate generalization beyond the training distribution, we present qualitative comparisons under two types of out-of-distribution captions, as shown in middle and right of Figure.

@article{ouyang2025motion,
title={Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation},
author={Ouyang, Runqi and Li, Haoyun and Zhang, Zhenyuan and Wang, Xiaofeng and Zhu, Zheng and Huang, Guan and Wang, Xingang},
journal={arXiv preprint arXiv:2506.10353},
year={2025}
}