It's not just the computational cost of real-time visual inference, there are some fundamental hurdles to be overcome we've attained truly human level computer vision.
It's hard to imagine a pure vision system ever really matching the performance of a fused sensor system.
It's hard to imagine a pure vision system ever really matching the performance of a fused sensor system.
Why infer when you can measure?