Efficient Human Motion Reconstruction from Monocular Videos with Physical Consistency Loss

1University of Hamburg 2Peking University 3Zhihu

Our method can reconstruct complex human motions from monocular videos in minutes.


Vision-only motion reconstruction from monocular videos often produces artifacts such as foot sliding and jittering. Existing physics-based methods typically either simplify the problem to focus solely on foot-ground contacts, or they reconstruct full-body contacts within a physics simulator, necessitating the solution of a time-consuming bilevel optimization problem. To overcome these limitations, we present an efficient gradient-based method for reconstructing complex human motions (including highly dynamic and acrobatic movements) with physical constraints. Our approach reformulates human motion dynamics through a differentiable physical consistency loss within an augmented search space that accounts both for contacts and camera alignment. This enables us to transform the motion reconstruction task into a single-level trajectory optimization problem. Experimental results demonstrate that our method can reconstruct complex human motions from real-world videos in minutes, which is substantially faster than previous approaches. Additionally, the reconstructed results show enhanced physical realism compared to existing methods.


Optimization Process

We demonstrate the reconstruction process of several complex motions, with the majority of acrobatic movements being successfully reconstructed in under two minutes.


Through the visualization of various iterations in the optimization process, we can clearly observe the significant enhancement in physical reasoning.


Animation rendering in Unreal Engine

The optimization results can be directly used to drive the animation in Unreal Engine.

Multiple cameras setup

We also support multiple camera setups. The keypoints are collected by an open-source software FreeMocap.


  author    = {Cong, Lin and Ruppel, Philipp and Wang, Yizhou and Pan, Xiang and Hendrich, Norman and Zhang, Jianwei},
  title     = {Efficient Human Motion Reconstruction from Monocular Videos with Physical Consistency Loss},
  journal   = {Siggraph Asia},
  year      = {2023},