If you identify high performance by the performance review system, and people who perform highly according to that system are more likely to be promoted than those who don't, then of course over the long term high performers are the ones who get promoted. And it might well be merely "eventually" consistent if a few dings early on (or later on!) don't destroy your chances to, over time, be considered a high performer by that system. It would be very hard for such a system to be anything but consistent, since the consistency is just self-consistency.
The relevant question isn't "do promotion decisions get made consistently with the tools used to rank performance" but "do the tools used to rank performance adequately track potential, and does the environment generally adequately make potential actual?". A system in which lots of people are dissatisfied and bored but some luck into positions for which they're suited, excel, and are promoted is, indeed, consistent, but it's also pretty wasteful.
I don't identify high performance by the performance review system. I identify high performance by "Who would I like to have on my team if I were to found a startup?" I've found a fairly high correlation between these people and the people who get promoted.
To be fair, Google has a lot of very good engineers, so of course there are a lot of great engineers who are getting promoted. I'm pretty sure you could find an equal number of good engineers who don't get promoted. The rank-and-file programmers are just very good, with almost no deadweight.
Google's promotion process isn't, as far as I can tell, that broken. What's broken is the policy of making political-success (or, "perf") scores part of the transfer process. It's mean-spirited and creates an autocorrelation in project quality that many people never overcome.
Google would be a real company with a culture actually worth caring about if the executives manned up and did the following:
1. Go to open allocation. When you have that much fucking cash, you can invest in employee autonomy. No excuses. Do it. Learn from Valve, because you're not a cultural leader anymore, Google. http://michaelochurch.wordpress.com/2012/09/03/tech-companie...
2. Get rid of the "calibration" nonsense. It's stupid, and it goes against the idea of a peer-review driven company because that bus is driven by managers only. Fire the B-student management consultants who came up with it. Get rid of the 5% firing rate, too. (I know that Google rarely actually fires people, instead humiliating them with those insipid PIPs and transfer blocks. No real difference. Firing people with a real severance package is a lot more decent than wasting their time with kangaroo-court PIPs and tearing up their careers slowly.) Firing should be saved for real problem employees, rather than a threat that turns no-fault lack of fit into a problem employee. This tactic of-- without a business need (such as in a cash-crisis layoff)-- firing some set percentage (who tend to just be unlucky) to keep people "on their toes" is mean-spirited thuggery that doesn't belong in this century.
The relevant question isn't "do promotion decisions get made consistently with the tools used to rank performance" but "do the tools used to rank performance adequately track potential, and does the environment generally adequately make potential actual?". A system in which lots of people are dissatisfied and bored but some luck into positions for which they're suited, excel, and are promoted is, indeed, consistent, but it's also pretty wasteful.