JS engines actually are optimized to make that usage pattern fast.
Small, short-lived objects with known key ordering (monomorphism) are not a major cost in JS because the GC design is generational. The smallest, youngest generation of objects can be quickly collected with an incremental GC because the perf assumption is that most of the items in the youngest generation will be garbage. This allows collection to be optimized by first finding the live objects in the gen0 pool, copying them out, then throwing away the old gen0 pool memory and replacing it with a new chunk.
The allocation of each object still has overhead though, even if they all live side-by-side. You get memory overhead for each value. A Uint8Array is tailor-made for an array of bytes and there’s a constant overhead. Plus the garbage collector doesn’t even have to peer inside a Uint8Array instance.
What happens when I send an extremely high throughput of data and the scheduler decides to pause garbage collection due to there being too many interrupts to my process sending network events? (a common way network data is handed off to an application in many linux distros)
Are there any concerns that the extra array overhead will make the application even more vulnerable to out of memory errors while it holds off on GC to process the big stream (or multiple streams)?
I am mostly curious, maybe this is not a problem for JS engines, but I have sometimes seen GC get paused on high throughput systems in GoLang, C#, and Java, which causes a lot of headaches.
Yeah I don't think that's generally a problem for JS engines because of the incremental garbage collector.
If you make all your memory usage patterns possible for the incremental collector to collect, you won't experience noticeable hangups because the incremental collector doesn't stop the world. This was already pretty important for JS since full collections would (do) show up as hiccups in the responsiveness of the UI.
Interesting, thanks for the info, I'll do some reading on what you're saying. I agree, you're right about JS having issues with hiccups in the UI due to scheduling on a single process thread.
Makes a lot of sense, cool that the garbage collector can run independently of the call stack and function scheduler.
OP doesn’t know what he’s talking about. Creating an object per byte is insane to do if you care about performance. It’ll be fine if you do 1000 objects once or this isn’t particularly performance sensitive. That’s fine. But the GC running concurrently doesn’t change anything about that, not to mention that he’s wrong and the scavenger phase for the young generation (which is typically where you find byte arrays being processed like this) is stop the world. Certain phases of the old generation collection are concurrent but notably finalization (deleting all the objects) is also stop the world as is compaction (rearranging where the objects live).
This whole approach is going to be orders of magnitude of overhead and the GC can’t do anything because you’d still be allocating the object, setting it up, etc. Your only hope would be the JIT seeing through this kind of insanity and rewriting to elide those objects but that’s not something I’m aware AOT optimizer can do let alone a JIT engine that needs to balance generating code over fully optimal behavior.
Don’t take my word for it - write a simple benchmark to illustrate the problem. You can also look throughout the comment thread that OP is just completely combative with people who clearly know something and point out problems with his reasoning.
Even if you stop the world while you sweep the infant generation, the whole point of the infant generation is that it's tiny. Most of the memory in use is going to be in the other generations and isn't going to be swept at all: the churn will be limited to the infant generation. That's why in real usage the GC overhead is I would say around 15% (and why the collections are spaced regularly and quick enough to not be noticeable).
I've been long on JS but never heard things like this, could you please prove it by any means or at least give a valid proof to the _around 15%_ statement?
Also by saying _quick enough to not be noticeable_, what's the situation you are referring too? I thought the GC overhead will stack until it eventually affects the UI responsiveness when handling continues IO or rendering loads, as recently I have done some perf stuff for such cases and optimizing count of objects did make things better and the console definitely showed some GC improvements, you make me nerve to go back and check again.
It's not blazingly fast, no, but it's not as much overhead as people think either when they're imagining what it would cost to do the same thing with malloc. TC39 knew all this when they picked { step, done } as the API for iteration and they still picked it, so I'm not really introducing new risk but rather trusting that they knew what they were doing when they designed string iterators.
At the moment the consensus seems to be that these language features haven't been worth investing much in optimizing because they aren't widely used in perf-critical pathways. So there's a chicken and egg problem, but one that gives me some hope that these APIs will actually get faster as their usage becomes more common and important, which it should if we adopt one of these proposed solutions to the current DevX problems
That's just how string iterators work in Javascript: one object for every byte. For now it's fast enough: https://v8.dev/blog/trash-talk. I'd put the GC overhead at around 10-15%, even with 20+ objects per byte when you add up all the stages in a real text processing pipeline. It's that cheap because the objects all have short lifespans and so they spend all their lives in the "tiny short-lived objects" memory pool which is super easy to incrementally GC.
In the future it should be entirely possible for the engines to optimize even more aggressively too: they should be to skip making the object if the producer of values is a generator function and the consumer is a for loop.
I agree with your post, but in practice, couldn't you get back that efficiency by setting T = UInt8Array? That is, write your stream to send / receive arrays.
My reference point is from a noob experience with Golang - where I was losing a bunch of efficiency to channel overhead from sending millions of small items. Sending batches of ~1000 instead cut that down to a negligible amount. It is a little less ergonomic to work with (adding a nesting level to your loop).
While I understand the logic, that's a terrible idea.
* The overhead is massive. Now every 1KiB turns into 1024 objects. And terrible locality.
* Raw byte APIs...network, fs, etc fundamentally operate on byte arrays anyway.
In the most respectful way possible...this idea would only be appealing to someone who's not used to optimizing systems for efficiency.