The problem is that a simpler decoder doesn't compensate for the extra instruction cache needed to achieve the same hit rates/levels of performance, and that is bad for power efficiency since L1 cache needs to run at full core speed and in modern CPUs there's vastly more transistor area in the cache than the decoder. The increased memory traffic from lower hit rates also doesn't help. This article shows that effect quite clearly:
The x86s have 32K of L1 icache, the ARMs 32K or 16K, and the MIPS Loongson has 64K. Also, the Loongson does not support MIPS16 whereas the ARMs all support Thumb. If you look at the total energy consumed, the MIPS is noticeably worse than x86 or ARM:
In fact, the cache takes so much power that Intel engineers have found it profitable to turn off parts of the cache when in low-power modes; this feature is called Dynamic Cache Sizing and appears in the later Atom series.
> that is bad for power efficiency since L1 cache needs to run at full core speed and in modern CPUs there's vastly more transistor area in the cache than the decoder
It's not that simple. Dynamic power depends on the toggle rate of the flip-flops and the electrical capacitance of the fan-out wires and gates, not on the number of transistors. In a cache, very few storage elements change their state in every cycle, while the decoder performs a lot of work in every cycle.
It's even more complicated than that, since the cache doesn't have to cache encoded instructions, they can actually store decoded instructions, and a few of the caches on a modern x86 cpu actually does that, for example there's a loop cache after the decoders, so that small loops never have to be decoded more than once.
And the MIPS is based on a 90nm process vs the 32nm of the Sandy Bridge they tested, while that is relevent to what you can buy, it says nothing about the intrinsic properties of the design.
Intel has had a massive advantege in fabrication for a long time.
http://www.extremetech.com/extreme/188396-the-final-isa-show...
The x86s have 32K of L1 icache, the ARMs 32K or 16K, and the MIPS Loongson has 64K. Also, the Loongson does not support MIPS16 whereas the ARMs all support Thumb. If you look at the total energy consumed, the MIPS is noticeably worse than x86 or ARM:
http://www.extremetech.com/wp-content/uploads/2014/08/Averag...
In fact, the cache takes so much power that Intel engineers have found it profitable to turn off parts of the cache when in low-power modes; this feature is called Dynamic Cache Sizing and appears in the later Atom series.