Right. The small issues are minor outages causing customer inconvenience and/or lost revenue. The big issues are major pwnage and loss of everything because you were afraid of causing small issues while fixing things.
That makes a lot of sense. By allowing the teams to fix things quickly and break things, management creates an environment where people can continuously learn how to handle fires and this makes them ready to handle bigger fires and also prevent bigger fires.
But the problem is that there are incentives to keep pushing the middle ground closer towards eventual system collapse.
It's never the right time to see if the backup generators can take the building load. But during real emergencies, it's amazing how common it is for the backup generators to not work for one reason or another.
"It's never the right time to see if the backup generators can take the building load. "
How about a test outside normal working hours?
It is possible, to meassure the power output before and then plug in enough stuff, that draws roughly the same.
But yes, it is more convenient to not do it and continue buisness as usual and hope for the best.
My point is, that in most cases, you can test and fix critical stuff and also fix problems created by your fixes, if you make it an important issue and plan accordingly. However, I did not say it is necessarily easy.