With the recent barrage of AI-slop 'speedup' posts, the first thing I always do to see if the post is worth a read is doing a Ctrl+F "benchmark" and seeing if the benchmark makes any fucking sense.
99% of the time (such as in this article), it doesn't. What do you mean 'cloneBare + findCommit + checkout: ~10x win'? Does that mean running those commands back to back result in a 10x win over the original? Does that mean that there's a specific function that calls these 3 operations, and that's the improvement of the overall function? What's the baseline we're talking about, and is it relevant at all?
Those questions are partially answered on the much better benchmark page[1], but for some reason they're using the CLI instead of the gitlib for comparisons.
The reason being bun actually tested both using the git CLI as well as libgit2. Across the board the C library was 3x slower than just spawning calls to the git CLI.
Under the hood, bun's calling these operations when doing a `bun install` and these are the places where integrating 100% gives the most boost. When more and more git deps are included in a project, these gains pile up.
However, the results appear more at 1x parity when accounting for network times (ie round trip to GitHub)
As other people mentioned this is obviously not something I would want in my notebook... but I can still appreciate the cool tech!
I can also definitely see this kind of thing being used in things budget outdoor displays, specially if the UI is made to accommodate the lack of accuracy, and the camera is positioned on the side (since these displays are usually vertical).
Difficult to capture reflections across a large screen while also dealing with outdoor lighting, glare, and moisture. The touchscreen part isn't usually what makes outdoor signage expensive compared to IP65, temperature control, and a secure housing, all of which would still need to apply here.
This looks like a neat option for retrofitting, and I suspect it'd work for some non-screen glass applications too. A combined IR/visible light solution would be interesting too, since I suspect those are complimentary (IR touch has issues with radiant light, while this wouldn't; this would have issues with low/no light, while IR wouldn't).
That final summary benchmark means nothing. It mentions 'baseline' value for the 'Full-stream total' for the rust implementation, and then says the `serde-wasm-bindgen` is '+9-29% slower', but it never gives us the baseline value, because clearly the only benchmark it did against the Rust codebase was the per-call one.
Then it mentions:
"End result: 2.2-4.6x faster per call and 2.6-3.3x lower total streaming cost."
But the "2.6-3.3x" is by their own definition a comparison against the naive TS implementation.
I really think the guy just prompted claude to "get this shit fast and then publish a blog post".
This. It’s so annoying to read these types of blogs now where the writer clearly didn’t put the effort to understand things fully or atleast review the blog their LLM wrote. Who is this useful for?
The article as a whole makes no sense. They are generating UI with an LLM. How fast the UI appears to the user is going to be completely dictated by the speed of the LLM, not the speed of the serialisation.
as an author of the blog - ouch
did a little bit more than prompt claude but a lot of claude prompting was definitely involved
I understand your frustration with AI writing though. We are a small team and given our roadmap it was either use LLMs to help collate all the internal benchmark results file into a blog or never write it so we chose the former. This was a genuinely surprising and counterintuitive result for us, which is why we wanted to share it. Happy to clarify any of the numbers if helpful.
It follows the same reasoning as when someone purposefully copies code from a codebase into another where the license doesn't allow.
Yes it might be the only viable solution, and most likely no one will ever know you copied it, but if you get found out most maintainers will not merge your PR.
I think most people wouldn't call proof-reading 'assistance'. As in, if I ask a colleague to review my PR, I wouldn't say he assisted me.
I've been throwing my PR diffs at Claude over the last few weeks. It spits a lot of useless or straight up wrong stuff, but sometimes among the insanity it manages to get one or another typo that a human missed, and between letting a bug pass or spending extra 10m per PR going through the nothingburguers Claude throws at me, I'd rather lose the 10m.
Overall it feels like unless your game is a linear single-player game, it will fall under multiple of the site's labelled 'dark patterns'. Here are some really bad ones:
Infinite Treadmill - Impossible to win or complete the game.
Variable Rewards - Unpredictable or random rewards are more addictive than a predictable schedule.
Can't Pause or Save - The game does not allow you to stop playing whenever you want.
Grinding - Being required to perform repetitive and tedious tasks to advance.
Competition - The game makes you compete against other players.
Especially for online games, these aspects are actually quite core to long term play. I am pretty casual as far as time invested goes, but many online games have to cater to both me and the Die Hards who play their games 10x more than other players.
To the die hard players, the infinite grind is a feature, treadmills help them reach whatever insane goals the developers have to keep cooking up so that they're satisfied.
Watching Arc Raiders evolve recently is a great example. It's trying to cater to casual players. It is going well now, but the die hards are going to ruin that experience I can promise. Then the die hards will be all that remain, and they'll have to cater to them.
The difference between a casual player and a die hard can be, 30hrs in a year played. And 5000 hrs in a year played. Some people play like it's their job.
What % of the time do you think the die-hard gamers live a healthy lifestyle? I’m thinking it’s higher than my knee jerk reaction, like 40-50%, but an important consideration. Of course people define “healthy” differently too, but obesity and mental health crises have objectively grown and correlate with rising technology use.
This is a hard one for me to ballpark because all my gaming friends are gym rats, so my view is probably skewed. Some of them game hours every day and make time for gym and nutrition. Unsure of there other health needs though.
I think it's much harder to be healthy when 6+hrs of your day is gaming. Especially if you also have work. It takes time to eat healthy, excercise, get out into the sun and be social. At some point you must be making compromises to game 6hrs a day.
If Chess were a mobile game, you'd be forced to watch an ad after every three moves, the bishop would be on a timer and to use it again you'd have to wait 10 moves or pay using in-game currency, the knight would only be available if you bought the DLC, and the game wouldn't run unless you granted it access to all of your contacts.
I have mixed feelings on this assessment. I definitely agree that some of these labels could be better ("can't pause or save" and "competition" are missing a lot of nuance), but some you mentioned feel reasonable on the part of the site creator (for example, "variable rewards", which is to say different reward outputs for the same performance/input, are a pretty classic Skinner box and unnecessary as a core feature to make most games work).
I'd also like to question the idea that that multiplayer games are being treated inherently "unfair" here or that these features aren't worth acknowledging as a dark pattern just because they're core to certain genres. I like Minecraft and there's variable drops and achievements and grinding and multiplayer and a bunch of other "dark patterns". I also like to straight up gamble occasionally, and I'm not a gambling addict as of the writing of this comment. It's more the awareness of things that can psychologically hook you that's important, and then you can do what you want with that (or for parents, they can attempt to restrict applications as they find appropriate).
The game I was the most addicted to was Age of Empires II, but I don't blame Microsoft for this: they just created an awesome game.
Competition + "can't pause", these two can really make you disconnected from real life if you're competitive, but it's also fun and somewhat useful to know how much you can push yourself and how far you can go on the ladder.
My advice is to force yourself to stop playing after each single match, but that's hard when you're in a loosing streak because you want to win at least one match.
Paul Morphy has to become the best chess player before he understood that chess was a waste of time. He said that it's important to know the game well but there's a limit.
Yes, this site reads like it's written by someone not enjoying games and understanding the concept of gaming. There is nothing dark about most of these concepts individually. The harm comes from combination and/or excessive usage. The dose makes the poison.
Though, learning them and being aware of them is not bad. But I'm curious how much the phrasing pushes the mindset of the readers in the wrong direction.
The website does label some relatively harmless elements as ‘dark patterns’, but out of your ‘really bad ones’, I don’t see ‘Competition’ as being a dark pattern.
Competition is a fundamental part of Play. Humans (and other animals) are social creatures and learn via playing and competing with others.
Can people play games by themselves? Yes.
Is competitive play bad or a dark pattern? Not at all.
The point I'm trying to make is this: the LLM output is a set of activations. Those are not "hidden" in any way: that is the plain result of running the LLM. Displaying the word "Blue" based on the LLM output is a separate step, one that the inference server performs, completely outside the scope of the LLM.
However, what's unclear to me from the paper is if it's enough to get these activations from the final output layer; or if you actually need some internal activations from a hidden layer deeper in the LLM, one that does require analyzing the internal state of the LLM.
99% of the time (such as in this article), it doesn't. What do you mean 'cloneBare + findCommit + checkout: ~10x win'? Does that mean running those commands back to back result in a 10x win over the original? Does that mean that there's a specific function that calls these 3 operations, and that's the improvement of the overall function? What's the baseline we're talking about, and is it relevant at all?
Those questions are partially answered on the much better benchmark page[1], but for some reason they're using the CLI instead of the gitlib for comparisons.
[1] https://github.com/hdresearch/ziggit/blob/5d3deb361f03d4aefe...
reply