Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One place the branchless techniques can come in handy is with SIMD (or simulated SIMD by packing several small ints into longer ones.) A branchless algorithm that's 50% slower can pay for itself if you can process four elements at a time.

Agreed that performance testing is absolutely necessary, preferrably with something that can tell you not only where exactly the processor is spending its time but also whether it's spending its time computing or waiting on memory etc, whether branches are being predicted well...



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: