Metal fatigue (and other wear and tear) might be a reason; our muscles are self healing and regenerating. It would likely be cheaper and less error prone to simply have more sensors, not moving at all or with much narrower ranges of motion to simulate depth perception. Training a computer to understand a simulacrum is an entirely different challenge, I think.
Of course cars can move their dozen or so cameras hundreds of times a second. But good luck getting a DNN to process all of that as well as the system handling tasks like identifying traffic lights, pedestrians.
Head of Tesla AI has already stated that even with the new Nvidia hardware they struggle to meet the computational requirements.
Those are two largely unrelated problems. Reconstructing geometry from multiple views is a separate (very well understood) problem from analysis to understand what the shapes mean.
Doesn't isn't can't. There's no reason why car-mounted hardware can't move just as much as eyes do.