One thing computer vision is missing is making a depth map given a 2d image. You can look at a photograph and describe it as a 3D scene. This will be important for many fields.
That's a good start. I was thinking you could generate unlimited training data by using a game engine. You'd have the actual 3D model for every single frame.