You're not understanding what I'm saying. It's of course true that some O(N) algorithms are slower than other O(N) algorithms. But the statement you quote doesn't imply the contrary. Obviously walking an array can be thousands of times faster than walking a linked list. That doesn't change the fact that walking an array of size 3*N is 3 times slower than walking an array of size N, which is (roughly) what big-O notation means and what the quote you're objecting to correctly claims.
> If you've got to do a pointer indirection for each N rather than a linear read that can end up dominating the non-constant factors in a lot of cases.
Huh? Pointer indirection and cache miss for each element is already non-constant (you have to do it N times) so I don't understand what it means that this "can end up dominating the non-constant factors".
Take the linked-list case for example. Let's say the allocator you have is a small-block allocator, of which most good malloc implementations use for linked-list style allocations.
Since the SBA allocates in chunks your cache locality(driving your "constant-factor" performace) isn't linear any more. It'll change in jumps & starts as you break chunk boundraries. Also another class allocating in-between your linked list allocations(really common) will break up the chunk locality and now your real-world O(N) performance no longer scales linearly.
The point I was trying(and failed) to drive home is that it matters how you walk your N not just what the N is.
If you've got to do a pointer indirection for each N rather than a linear read that can end up dominating the non-constant factors in a lot of cases.