查看: 1786|回复: 9

[其它] Automatic 3D Face Generation from Video

彬彬

797 主题	1 听众	1万积分

资深设计师

Rank: 7 Rank: 7 Rank: 7

纳金币: 5568
精华: 0

电梯直达

楼主

发表于 2011-12-29 09:02:00 |只看该作者 |倒序浏览

1 Introduction

3D face models have been widely applied in various fields (e.g. biometrics, movies, video games). Especially, it is one of the most popular and challenging tasks in computer vision and computer graphics to recons***ct a 3D face model only with single camera without attaching landmarks and projecting laser dots or s***ctured light patterns on a face. For example, Maejima proposed a method, based on generic-model approach, which can quickly recons***ct a 3D face model from 2D single photograph using a deformable face model [Maejima et al. 2008]. However, since it supposes input as a frontal face image, this method cannot express the individual facial parts’ geometry such as height of nose and cheek contour accurately.

In this paper, we propose a 3D face recons***ction method with hybrid approach that combines S***cture-from-Motion(SfM) approach based on “Factorization Method” to estimate an accurate 3D point depth information and generic-model approach based on “Deformable Face Model” to keep an appropriate local face shape. Unlike other methods, our method requires no manual operations from image capturing to 3D face model output. Moreover, our method is***cuted quickly by using the feature tracking technique proposed by [Irie et al. 2011] and the 3D facial geometry estimation technique proposed by [Maejima et al. 2008]. Using our method, the user can quickly create a 3D face model by only shaking own face freely in front of a single video camera such as Figure 1(a).

2 3D Recons***ction

Input Image Sequence: Firstly, we capture an image sequence in which an user rotates own face in front of a camera freely and gradually. Here, our method does not constrain the user how to rotate own face. Therefore we can obtain information of the user’s face observed from multi-view directions. However, the user needs to enforce expressionless throughout the video sequence, providing a rigid surface for recons***ction. Figure 1(a) shows a part of the input image sequence.

Feature Detection: Next, 30 facial feature points are automatically detected for all frames in the image sequence using [Irie et al. 2011] technique. Figure 1(b) shows examples of the results of feature detection.

Factorization Method: To recons***ct 3D points, we adopted “Factorization Method” which simultaneously estimates camera motion and object 3D shape, and has the advantage that the solution is relatively-robust [Tomasi and Kanade 1992]. Figure 1(c) shows the results of sparse 3D points estimated by factorization method.

*e-mail: wap.0921@akane.waseda.jp

&dagger;e-mail: shigeo@waseda.jp

3 Model-based Fitting

Deformable Face Model: We cons***ct the “Deformable Face Model(DFM)” by calculating Principal Components(PCs) for vertices of each face model in the database which includes 1153 male/female, young/elderly 3D face models. The DFM can express any 3D facial geometries by controlling each magnitude of Principal Component.

Optimal Deformation: To estimate dense 3D facial shape from sparse 3D points, we apply the 3D facial geometry estimation technique proposed by [Maejima et al. 2008]. The 3D shape is estimated by minimizing energy function which is designed as the combination the fitness term and the face likelihood term. The fitness term means that it measures the sum of square norm between the vertex of the sparse point and its corresponding vertex of the DFM. The likelihood term means that it computes the likelihood of current PCs and can restrict deforming an unnatural face using Gaussian Mixture Model learnt from database. Finally, deforming the DFM based on estimated optimal PCs, and mapping the texture of frontal face image to the estimated 3D shape, we can acquire complete 3D face model.

4 Results and Conclusion

According to the procedure mentioned above, we can generate a 3D face model on Intel Core i5-750 (2.66GHz), and input sequence are shot with a single consumer video camera which can record with 30 fps and resolution of 600*600 pixels. An average computation time to complete 3D face model after capturing image sequence is less than 10 seconds. Figure 1(d) shows a result of recons***cted 3D face models. In consequence, we can fast-create a 3D face model without any manual operations, and our method succeed to express more plausible facial geometry compared with previous method (the comparison results are shown in supplemental materials). As a future work, we need to examine the evaluation to make sure an accuracy of the recons***cted 3D face models by our method. In addition, since our goal is to employ this recons***cted 3D face models for face identification, we need to extend our database by collecting more various types of face models in order to recons***ct 3D faces with variation of ethnicity.

References

IRIE, A., et.al., 2011. Accuracy Improvements of Facial Contour Detection by Hierarchical Fitting using Regression, 17th SSII 2011, Poster, IS1-06

MAEJIMA, A., and MORISHIMA, S. 2008. Fast Plausible 3D Face Generation from a Single Photograph, In ACM SIGGRAPH ASIA 2008, Poster, maejima.pdf

TOMASI, C., and KANADE, T. 1992. Shape and Motion from Image Streams under Orthography: a Factorization Method, International Journal of Computer Vision 9, pp. 137–154