The problem with self-driving is that it is based on data, but the environment may change. See e.g. the case where Tesla thinks that firetrucks are roads.
So if fashion changes, pedestrians may suddenly look like road too, as just an example.
Another problem is that state-of-the-art classification networks have an accuracy in the 90% range. Given that a car has to take hundreds of decisions in a single ride, then even if the accuracy was 99%, you see that error rate simply gets too high.
> state-of-the-art classification networks have an accuracy in the 90% range.
If you're referring to ImageNet SOTA, is has 20000 different classes, including 120 different dog breeds [1]. This is a vastly different task than reliably detecting pedestrians where Tesla can actively curate a dataset of hard examples (from their fleet), whereas ImageNet is fixed, sometimes with low quality labels and as few as a couple of hundred examples. Tesla can also pick a point on the ROC curve to give higher recall but more false positives (which is important for VRUs specifically). Another big factor is that Tesla is using video, not still images, which makes predictions even more robust.
And that's just for pedestrians, Tesla are also using a general ViDAR (visual LiDAR) which is trained to detect obstacles that do not have a specific class. The ViDAR again operates on image sequences, not a single image, and can thus pick out structure from motion.
So if fashion changes, pedestrians may suddenly look like road too, as just an example.
Another problem is that state-of-the-art classification networks have an accuracy in the 90% range. Given that a car has to take hundreds of decisions in a single ride, then even if the accuracy was 99%, you see that error rate simply gets too high.