Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

3D reconstruction is used in state-of-the-art facial recognition as well.[1] Essentially you reconstruct the face in 3D, rotate the 3D model to the front, project it back into 2 dimensions, and then feed it through a CNN with deep architecture. Because this gives you very good alignment, you can do tricks like not having shared weights across the entire image. That is, each section of the input vector is known to correspond to a certain part of the face and thus can learn unique parameters that are well suited for that specific region.

The paper claims that it takes about 105 seconds to render a single frame. So one second of 30 fps video would take about 52 minutes to render. I would have to read more in depth to see what kind of savings can be had by sharing information across frames. (The paper also doesn't mention the use of GPU acceleration.)

[1] https://www.facebook.com/publications/546316888800776/



Roughly they use a model of the face, render it, compare the source image and the rendered image, estimate changes of position, orientation and deformation for the model that will yield a better match and then just repeat this until the result is good enough. While you probably can exploit temporal coherence between frames the process is inherently pretty expensive due to its iterative nature. But it may also be relatively easy to parallelize bbecause of this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: