I don't know the specifics of what happened here, but in my experience with automatic configuration generation one must have a way to validate the config, but that validator can have bugs (as any other software).
Then either the software loading the configuration detects the problem or the monitoring system detects something's not right, and automatically the last working configuration is applied and the non working one is discarded.
By the looks of it I would say their monitoring detected the problem but the reliability team needed some minutes to realise it was a configuration problem. A classic problem is a network appliance that is misbehaving (eg. firewall, switch, etc), but nobody knows it is because of the configuration and it is replaced by a fallback appliance that... oh, has the same problem (configuration).
All together 25 minutes seems a lot, but when you're troubleshooting and you know an important part of your infrastructure is down, time fly!