Face Tracking Using Diffusion Model Generated Data
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Computer Vision
Department
Computer Vision
First Advisor
Dr. Hao Li
Second Advisor
Dr. Martin Takac
Abstract
Face tracking is crucial for creating photo-realistic, digital human in high-end productions and virtual reality applications. Generally, face tracking can be divided into two categories based on different setups: Multi-view setups and Monocular setups. This thesis focuses on monocular setups, utilizing only RGB images as input, within the realm of face tracking. Our approach tackles the inherent limitations of current face tracking technologies, primarily addressing the labor-intensive process of constructing training datasets and the inaccuracies associated with manual data labeling. By fine-tuning ControlNet on pretrained stable diffusion, we propose a method to generate synthetic face datasets, significantly boosting the efficiency and accuracy of face tracking systems. Subsequently, we leverage ResNet50 to predict probabilistic facial landmarks, integrating Gaussian Negative Log Likelihood Loss to account for uncertainty. These predicted landmarks, along with their associated uncertainties, serve as inputs to optimize the 3D geometry of the head through FLAME model fitting process. Our generated dataset based on diffusion model reduces the reliance on manual data collection and potentially decreases biases associated with traditional methods. The landmarks prediction and 3D fitting results demonstrate that diffusion model-based datasets can achieve state-of-the-art performance in 3D face tracking by closely mimicking real-world conditions. What's more, we prove that proper data augmentation can also improve the accuracy of 3D geometry and handle well with extreme pose. And including uncertainty in landmark predictions also results in improved accuracy and better representation of diverse expressions, regardless of the dataset used. Our method not only enhances the realism and accuracy of 3D face reconstruction but also proposes a scalable solution to the data constraints during training for landmark prediction, making face tracking more accessible and cost-effective.
Recommended Citation
Y. Zhang, "Face Tracking Using Diffusion Model Generated Data,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Computer Vision
Advisors: Hao Li, Dr. Martin Takac
Online access available for MBZUAI patrons