Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, it's the SIMD loop thing.

The claim that k runs quickly because its functions fit in the CPU cache is, as far as I can tell, an off-hand comment that Arthur Whitney made once which has been repeated far more than it deserves. It's false—instruction cache behavior doesn't contribute significantly to k's advantage over other languages—for a few reasons: inner loops in array languages are 3–5 orders of magnitude smaller than the CPU cache, the loops that compiled languages produce also fit in cache, and instruction caching doesn't matter all that much for performance anyway. Despite spending plenty of time looking for it, I've never been able to measure an impact of code size on performance. Even data caching doesn't have that much of an effect: current versions of Dyalog APL almost always allocate new arrays from uncached memory (this is mostly fixed in the next version), and it's still one of the fastest array languages around. Unless you're using SIMD, code with linear access patterns can't even keep up with main memory, and the cache has no effect.

Why is using SIMD cheating? SIMD loops are the only way to get the full performance out of a CPU, and fast compilers do try to produce them. If it turns out that array-based interpreters are a better way to convert programmer intentions to SIMD loops than scalar compilers (and it certainly seems that way) then the array languages are legitimately faster. I suppose SIMD is considered "non-portable" because you can't use it from C, but that's an artificial restriction coming from historical programming language design decisions. The most important vector instructions are the same in any modern vector ISA. How is using standard CPU features cheating? They're not even that much newer than double-precision float support.

(I'm an implementor for Dyalog APL.)



My comment was that taking advantage of SIMD wasn't cheating, but "cheating" was a joke in both cases.

Though I disagree with your comment on the cache: it's very blatantly better for interpreters; Whitney isn't the only one who's a believer in that, Moore does too, and Moore is more competent than just about anybody. If you look at the performance of k and Moore-written Forths compared to Dyalog APL, it does seem like they have a point.


If you're talking about parsing speed, Dyalog is slow because it has a much more complicated grammar, and because it stores the execution stack in the workspace in order to make stack overflows impossible. If you're claiming k or Forth is faster for large array processing, I'd like to see some benchmarks. Do you have a citation for Moore on the instruction cache?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: