I took the awesome 6502 emulator from http://6502asm.com/ and converted it to accumulate instructions from each basic block, turn them into functions, caches them, and let the JIT do its magic. While I never benchmarked it, I'd say it got between a 5x and 20x speedup on most things. It breaks certain self-modifying code (never bothered to add cache invalidation), but overall it works well.
Am I misunderstanding or does his "tracing" stop at the basic block level? While basic blocks are completely predictable traces through the program, they are usually very short. You can get much more benefit by finding long common traces through multiple basic blocks and compiling the whole thing as a unit, providing tests and/or fixup code in case you find you actually needed to branch somewhere else in the middle of a trace. This is what I think most compiler people would call a "tracing JIT."
Hm, that's a good point. For some reason I assumed it would trace through unconditional branches, but the post I linked to does indeed say "I terminate the trace after any branch instruction", not "any conditional branch instruction". That seems like quite a missed opportunity; I wonder why the code has that restriction.
In the context of a JIT, you have a tradeoff to make: do you exhaust all possibilities for optimization, or do you do as little as possible to get something running. That's why you have things like Hotspot, where they initially do as little as possible to get things rolling, then optimize heavily as it finds bottlenecks. This is no easy task.
http://weblogs.mozillazine.org/roc/archives/2010/11/implemen...