A Motion Synthesis Framework Without Motion Capture, Integrating Language and Visual Priors

Emily Carter; Jason M. Bennett; Sofia Almeida; Thomas R. Walker

doi:10.71465/fair289

Authors

Emily Carter Department of Computer Science and Engineering, The Hong Kong University of Science and Technology (HKUST), Clear Water Bay, Kowloon, Hong Kong SAR Author
Jason M. Bennett Department of Computer Science and Engineering, The Hong Kong University of Science and Technology (HKUST), Clear Water Bay, Kowloon, Hong Kong SAR Author
Sofia Almeida Division of Life Science, The Hong Kong University of Science and Technology (HKUST), Clear Water Bay, Kowloon, Hong Kong SAR Author
Thomas R. Walker Department of Computer Science and Engineering, The Hong Kong University of Science and Technology (HKUST), Clear Water Bay, Kowloon, Hong Kong SAR Author

DOI:

https://doi.org/10.71465/fair289

Keywords:

motion synthesis, multimodal fusion, large language model, visual prompt, skeleton modeling, unsupervised generation, motion prior

Abstract

To address the problem that human motion synthesis heavily depends on motion capture data, this paper introduces a motion synthesis framework without motion capture, which integrates language and visual priors. This method builds a multimodal representation module to map natural language and image prompts into a shared latent space. A skeleton structure constraint network is also designed to improve the physical plausibility and continuity of motion generation. Experiments are carried out on HumanML3D, UMLS-Motion and a self-constructed multimodal instruction set. The results show that the proposed method improves motion quality (Frechet Gesture Distance) and coherence by 9.4% and 7.8%, respectively, compared to existing methods. Transfer tests show that the method performs well in both cross-domain and zero-shot prompts. The results confirm that combining visual and language-based multimodal prompts can effectively enhance the diversity and controllability of motion generation.

Downloads

Download data is not yet available.

A Motion Synthesis Framework Without Motion Capture, Integrating Language and Visual Priors

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

Journal Information

Journal Name : Frontiers in Artificial Intelligence Research

Latest publications

Information

Make a Submission

Keywords