Hacker Newsnew | past | comments | ask | show | jobs | submit | more wild_pointer's commentslogin

I wonder how much of it is due to the model being familiar with the game or parts of it, be it due to training of the game itself, or reading/watching walkthroughs online.


There was a well-publicised "Claude plays Pokémon" stream where Claude failed to complete Pokemon Blue in spectacular fashion, despite weeks of trying. I think only a very gullible person would assume that future LLMs didn't specifically bake this into their training, as they do for popular benchmarks or for penguins riding a bike.


If they game the pelican benchmark, it’d be pretty obvious.

Just try other random, non-realistic things like “a giraffe walking a tightrope”, “a car sitting at a cafe eating a pizza”, etc.

If the results are dramatically different, then they gamed it. If they are similar in quality, then they probably didn’t.


While it is true that model makers are increasingly trying to game benchmarks, it's also true that benchmark-chasing is lowering model quality. GPT 5, 5.1 and 5.2 have been nearly universally panned by almost every class of user, despite being a benchmark monster. In fact, the more OpenAI tries to benchmark-max, the worse their models seem to get.


Hm? 5.1 Thinking is much better than 4o or o3. Just don't use the instant model.


5.2 is a solid model and I'm actually impressed with M365 copilot when using it.


> as they do for popular benchmarks or for penguins riding a bike.

Citation?


9/10 for originality. 2/10 for usefulness.

Not bashing, that's how good ideas are found. But not this time IMO :)


I disagree.


OK.


In the era of LLM-generated content, such a high-quality writeup is a breath of fresh air. Well done!


This guy is doing something else completely. In his words:

> In my testing, it's between 1.2x and 4x slower than Yolo-C. It uses between 2x and 3x more memory. Others have observed higher overheads in certain tests (I've heard of some things being 8x slower). How much this matters depends on your perspective. Imagine running your desktop environment on a 4x slower computer with 3x less memory. You've probably done exactly this and you probably survived the experience. So the catch is: Fil-C is for folks who want the security benefits badly enough.

(from https://news.ycombinator.com/item?id=46090332)

We're talking about a lack of fat pointers here, and switching to GC and having a 4x slower computer experience is not required for that.


I am actually not talking about the lack of fat pointers. That is almost entirely orthogonal to my point. I am talking about the fact that what would be the syntax for passing a array by value was repurposed for automatically decaying into a pointer. This results in a massive and unnecessary syntactic wart.

The fact that the correct type signature, a pointer to fixed-size array, exists and that you can create a struct containing a fixed-size array member and pass that in by value completely invalidates any possible argument for having special semantics for fixed-size array parameters. Automatic decay should have died when it became possible to pass structs by value. Its continued existence continues to result in people writing objectively inferior function signatures (though part of this it the absurdity of C type declarations making the objectively correct type a pain to write or use, another one of the worst actual design mistakes).

Fat pointers or argument-aware non-fixed size array parameters are a separate valuable feature, but it is at least understandable for them to not have been included at the time.


> The fact that the correct type signature, a pointer to fixed-size array, exists and that you can create a struct containing a fixed-size array member and pass that in by value completely invalidates any possible argument for having special semantics for fixed-size array parameters.

That's not entirely accurate: "fixed-size" array parameters (unlike pointers to arrays or arrays in structs) actually say that the array must be at least that size, not exactly that size, which makes them way more flexible (e.g. you don't need a buffer of an exact size, it can be larger). The examples from the article are neat but fairly specific because cryptographic functions always work with pre-defined array sizes, unlike most algorithms.

Incidentally, that was one of the main complaints about Pascal back in the day (see section 2.1 of [1]): it originally had only fixed-size arrays and strings, with no way for a function to accept a "generic array" or a "generic string" with size unknown at compile time.

[1] https://www.cs.virginia.edu/~evans/cs655/readings/bwk-on-pas...


depending upon how one has structured the code, a less painful way to write the same is:

    typedef char array[5];

    void do_something(array *a) {
        enum { a_Size = sizeof *a };
        memset(*a, 'x', a_Size);
    }
it rather depends upon how painful it will be to create a bunch of typedefs.

Beyond a certain point, if there are too many arrays of the same size with different purposes, my inclination is to wrap the array in a struct, and pass that around (either by pointer or value depending upon circumstances.)

The existence of the decaying form is if I recall correctly a backward compatibility thing from either B or NB; simply because in one or the other pointers were written in the (current) array syntax form.


It stems from B, because it didn't have either pointers or arrays on the type level. Declaring an array allocated the storage, but the variable itself was still a word-typed pointer to said array. In fact, you could even reassign it!

   foo(a) {
      return(&a[1]);
   }
   bar() { 
      auto a[10];
      a = foo(a);
   }
The decaying system made it mostly work with minimal changes in C.


This is not about performance.


You didn't read this, did you? https://alexgaynor.net/2019/apr/21/modern-c++-wont-save-us/

It's not a pointer.


Hey! That wasn't easy!


This is malware, the project is on github


Indeed, some people are :)


Impressive work! Also waiting for fine-grained caching:

https://discourse.llvm.org/t/rfc-add-an-llvm-cas-library-and...


That is dated Feb 2022. Do you know if anything came of it?


Quite a few patches have landed. A couple features using this have already shipped in Apple’s downstream clang.


LLM-created software might


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: