HGG: Learning Efficient and Generalizable Human Representation with Human Gaussian Model

Yifan Liu^1,*, Shengjun Zhang^1,*, Chensheng Dai¹, Yang Chen², Hao Liu³, Chen Li³, Yueqi Duan^1,✝

¹Tsinghua University, ²Nanyang Technological University, ³Tencent
^*Indicates Equal Contribution

ICCV 2025

Paper Code arXiv

Human Gaussian Graph can generate high-quality generalizable and animatable human Gaussian representations from monocular videos.

Abstract

Modeling animatable human avatars from videos is a long-standing and challenging problem. While conventional methods require per-instance optimization, recent feed-forward methods have been proposed to generate 3D Gaussians with a learnable network. However, these methods predict Gaussians for each frame independently, without fully capturing the relations of Gaussians from different timestamps. To address this, we propose Human Gaussian Graph to model the connection between predicted Gaussians and human SMPL mesh, so that we can leverage information from all frames to recover an animatable human representation. Specifically, the Human Gaussian Graph contains dual layers where Gaussians are the first layer nodes and mesh vertices serve as the second layer nodes. Based on this structure, we further propose the intra-node operation to aggregate various Gaussians connected to one mesh vertex, and inter-node operation to support message passing among mesh node neighbors. Experimental results on novel view synthesis and novel pose animation demonstrate the efficiency and generalization of our method.

SOTA Rendering Quality with Remarkable Fast Run-time Performance

MY ALT TEXT

(a) Qualitative results: HGG delivers high-fidelity results for both novel view synthesis and novel pose animation. (b) Performance comparison: HGG achieves the highest PSNR in both single view (yellow) and multiview (blue) settings with superior computational efficiency.

Experimental Results

Qualitative comparison of ours against GART, ExAvatar, LGM and GPS-Gaussian on MvHumanNet dataset.

More qualitative results on novel pose animation.

More qualitative results on novel view synthesis.

Method Overview

MY ALT TEXT

Given an input human video, our goal is to build high-fidelity animatable Gaussian representations within inference time. We first establish frame-wise Gaussian representations through a feed-forward 3DGS network. Then, we construct a Human Gaussian Graph (HGG) to model the relations between predicted Gaussians from multiple frames and the SMPL mesh. We introduce two complementary types of operations on the HGG: the intra-node operation that extracts temporal features across multiple timesteps, and the inter-node operation that facilitates robust local message passing between topologically adjacent nodes. Finally, the Gaussians are updated into SMPL-aligned Gaussians through the HGG framework, enabling novel pose animation.

BibTeX

BibTex Code Here