The fact that you write three nested loops on your program does not mean that the compiler/interpreter cannot do all these nifty things when running the code.
You're talking about the wrong language, then. This is not c-python (or python too perhaps) if you want such ridiculous compiler-level optimizations. If you want the languages for loops to be almost on-par with the hyper-optimized implementations being utilized in numpy and the layers below it then you should use something like C++ and hand-roll those for-loops yourself.
The reason python is so amazing in this regard is because that has all been done for you on some level and you just need to use them. Sure, could the numpy/scipy interfaces into lower-level code be more closely aligned to plain for loop implementations of the algorithms they represent? Perhaps.