Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The exploit has been fixed, and this video shows how easy it was to perform: https://www.youtube.com/watch?v=QPl_BJoBaVA

TL;DW: Steam accepted blank recovery codes for password resets, enabling passwords to be changed without needing access to the recovery email account.

I hope someone is writing a new test case.



> Steam accepted blank recovery codes for password resets

Sample buggy code (that I just made up):

(user_token is user supplied token, token is the correct token)

  for(i = 0; i < strlen(user_token); i++) {
    if(user_token[i] != token[i]) return false;
  }
  return true;
If code is blank this will falsely return true. This is also subject to truncation attacks.

I'm trying to think of a bug where blank fails, but truncated versions do not.


Another not-so-obvious bug with your example buggy code: it's also vulnerable to timing attacks.

http://codahale.com/a-lesson-in-timing-attacks/


True, but I don't think Steam allows you enough tries to make it viable.


Also not a bug, but putting strlen() inside the loop condition makes my eyes twitch


A "goto fail" style bug could allow for blankness but not truncatedness. Basic idea:

    if(!check_params()) {
        failed = true;
        goto fail;
    }
    if(some_other_check()) {
        failed = true;
        goto fail;
    }
    if(token_length == 0) {
        goto fail; // oops, forgot to set failed
    }
    ...check the token...
    fail:
    if(failed) {
        return auth_failure;
    }
I have no idea if this is anything like what actually happened, of course.


Yeah, strcmp() exists for a reason!


But in this case you'd want to use a constant-time string comparison function, which, afaik, strcmp() is not.


Is such a timing attack really possible over the internet? I'd expect the code to expire after a few failed attempt anyway.

But I guess you're right, better safe than sorry.


Surprisingly... yes, it is possible. See the link in my other comment: https://news.ycombinator.com/item?id=9953496


Good point. Is there are library of constant time functions for security sensitive applications to replace the standard ones in libc? (Seems like the kind of thing that the OpenBSD people might have written.)

strcmp() is at least better than a buggy equivalent that doesn’t actually work :)


The idea is to use constant time hashing functions, and then compare the hashes instead of comparing the strings themselves. AFAIK there is no implementation of the C stdlib that makes sure no timing attacks can be performed.


We're comparing one-time fixed-length code. Is it necessary?


Probably not. Ideally you'd implement this endpoint to invalidate the code after a few failures, so a timing attack would be impossible to carry out. But it's a good idea to use one anyway, in case it turns out that you're wrong, or that other code is also broken somehow. It costs almost nothing to do so.


Try it out here: https://jsfiddle.net/valgaze/wpL2La7d/

If this is in fact how how things worked, what a great showstopper bug


Looks like 's' is also considered valid in this example


And so is any part of the full word (s, st, ste, stea, steam)


That's one possibility. Another possibility is they were merely comparing the submitted nonce to the nonce on file. If a nonce had never been generated for a user, the database may have contained an empty string. An empty string in the db would match an empty string submitted by an attacker.


No idea or visibility into Steam's backend. Hope to god they are not using C for this very reason. Great for games, terrible for SAS


If you wrote it like that this bug would happen in every language. There is nothing in there that is worse because of C.


Yeah, but in other (more high-level) languages you simply wouldn't compare strings byte by byte.


Neither would you do it in C.


Their webapp is (AFAIK) all PHP; not sure what the backend looks like.


I worked on a big name project that had this same problem, which we found because users had begun exploiting it. The problem was caused when a user would intentionally prevent the recovery_code from submitting with the rest of the form, which resulted in recovery_code being nil, which resulted in "SELECT something FROM users WHERE recovery_code IS NULL AND email = targeted_email" hitting the database. This wouldn't work if you'd requested a password reset code and not yet used it, but everyone else was vulnerable. Sounds like something similar happened here.

EDIT: Updated explanation of problem because I originally explained it wrong (required omission of the value on submission, not just a blank value).


Yet another example of why allowing nullable values anywhere is a bad idea.


How did the string go from "" to nil automagically?


I explained it incorrectly. It actually wasn't as simple as leaving the recovery code blank, because as you note, that would result in an empty string. What they did was delete the recovery_code element from the form entirely and then submit. It was a Rails project, so when the application requested params[:recovery_code], it would get back nil, since Ruby gives nil when you request a key that doesn't exist in a hash. If you left the box blank, you wouldn't hit the bug, because "" != NULL.


It wasn't the case here, but Oracle effectively treats empty string and null as the same value.


In a similar vain MS SQL Server ignores trailing spaces when performing string comparisons, which can lead to odd bugs where the data layer considers things to be equal when other layers don't.

Similarly most RDBMSs are case insensitive by default while most programming languages are not, which again can lead to problems where different layers in an application disagree about string equality.

It isn't at all unlikely that bugs in naive code (caused by people not being concious and careful of these sorts of differences) can allow attackers to cause useful information to leak.


Bad database constraints obviously.


For this to happen, wouldn't it mean that recovery codes never expired?


No, the recovery codes did expire. I don't remember exactly how the system was checking expirations, but it was obviously not in a way that precluded this attack, though I agree there are several possible ways to check expirations that would've done so. Most likely, the application ignored expiration dates when set to NULL instead of complaining about the invalidity.

Anyone who did not have an active password reset token could have their password reset, because it was selecting users based on their username/email and the user not having an active password reset token (token reset to NULL upon successful password reset, so it wasn't just users who had never reset a password, but also any user who had redeemed their most recent password reset request (obviously, this included practically all users)).


This would not happen with normal use of a modern framework.


Not sure how you're defining "modern", but it did happen with Ruby on Rails 2.x. I bet I could recreate it in a Rails 4 app.

Yes, there are things that could've been done to prevent this bug from happening, like using Rails's validation mechanisms to reject any POST that didn't contain the field, but that's not really the point. The point is that simple, subtle bugs can and do sneak into production codebases, and more-exhaustive-than-usual testing is needed for anything used to control account access like password resets.


Rails isn't modern. I assume though that at least it does support validation, which is exactly the point. Building a bunch of tests in order to avoid college-level development practices would be a waste. Tests are valuable, but as an additional level of protection on top of good - or just regular - practices.


What are some examples of a modern framework?


Laravel and Yii are my favorites in PHP. Both borrowed concepts from Rails, but brought in the maturity of older PHP frameworks.


> maturity of older PHP frameworks.

I mean Laravel is somewhat good. But there wasn't a real mature PHP Framework in the past. Especially not as mature than Rails, Django.


> Rails isn't modern.

You are using a drastically different version of modern than normal English usage. Where's that Scotsman when you need him?


This is just insane how crazy this is easy to exploit. What did the developers think?? Also for how long was this exploitable? If it was a long time, this could show that we tend to over estimate the security of our websites... (a huge amount of people took a long time to find a super easy bug)


There is absolutely a mess of little oversights like this in every non-trivial software project (I just fixed one not 30 minutes ago). The effects are sometimes very serious, as in this case. Cyber-warfare is scary precisely because of this. There is nothing we can do about it IMO, though things like SELinux/AppArmor and even good filesystem-level permissions help a little bit.


There is something you can do about it, but it's expensive and different from what you are used to. Safer languages is a good place to start.

Input validation can be made to be machine-verified, and anit-patterns can be learned and avoided. Static analysis tools can raise red and yellow flags.

Finally, frameworks are being built for handling security critical things like account validation, crypto key handling, an related tasks that are important to get right.

As the space continues to evolve, things like this will become less common, and someone who is aware of the trade and is diligent will be less likely to make mistakes.


> anit-patterns can be learned

a spell checker didn't save you from a typo. not nitpicking, just using this example to illustrate a point: for an app to be secure, developers have to be 100% of the time on ball, while an attacker just has to be lucky once.

the weight is so much in favor of attacker than only few of the biggest can afford to pay for a competitive level of security, it being a race: security is effective not when perfect, but when it costs more to attack a site than what you'd gain of it.


> a spell checker didn't save you from a typo

Believe this is where parent's preference for high-quality, easy-to-use libraries handling common security operations comes from.

It's a lot easier to make a typo (or logical mistake) in 250 lines of your own code than in (hopefully) <250 lines of a library invocation.

Honestly, library functionality like this in any given language should be thought of as an analog to infrastructure spending. We're finally seeing reinvestment from end-users and value capturers (Facebook, Google, MS, etc) back down the chain to the OSS projects they depend on.

Language guiders (and in some cases the support businesses who are attached to languages) should be equally serious about this. If your language doesn't have high-quality, easy-to-use libraries to mitigate attack surfaces, that's a fundamental weakness in your language eco-system...

(Aka the "Perl+CPAN is better than a lot of more advanced languages, because code" argument)


Or even when the gain/cost ratio for attacking YOUR site is greater than the one for other sites. (You don't have to outrun the bear, you have to outrun the other people.)


> Safer languages is a good place to start.

Could (and should) a safer language pre-empt that you should first check strlen(str) != 0 before doing strcmp? I know of no language which does this.

> Input validation can be made to be machine-verified

Likewise, can you elaborate on this? How does the machine know what type of input this is? If input is tagged by the developer, what's to stop the developer from forgetting to tag the input correctly?


Could (and should) a safer language pre-empt that you should first check strlen(str) != 0 before doing strcmp? I know of no language which does this.

In some languages, you can define a string type that only contains strings of one or more characters. And then you'd need to "convert" a regular string into one of those by using some function that forced you to handle the strlen(str) == 0 case.

I believe you can define that in Idris, for example.


except for strlen(str) != 0 doesn't really do much to fix the issue. At best all it does is mean you have to guess the first character.

If you are comparing byte by byte the correct code is:

  valid = true;
  for(i = 0; i < strlen(user_token); i++) {
    if(strlen(token) < i || user_token[i] != token[i]) valid = false;
  }
  if(strlen(token) != strlen(user_token)) valid = false;
  return valid;


Let's make it somewhat more readable:

    int token_length = strlen(token);
    int input_length = strlen(input);
    int match_count = 0; // for constant-time comparison...
    
    // Just get this out of the way now; no need to dawdle
    // (because the length of a token is generally known)
    if (token_length != input_length) {
        return false;
    }
    
    // No need to worry about mismatched buffer lengths at this point
    for(int i=0; i<token_length; i++) {
        if (token[i] == input[i]) {
            match_count++;
        }
    }
    
    // if the same number of characters match, voila!
    return (match_count == token_length);
It takes far too long to understand the intent and verify the correctness of lines like:

    if(strlen(token) < i || user_token[i] != token[i]) valid = false;


And then your compiler will optimize it to this:

    int token_length = strlen(token);
    
    // Just get this out of the way now; no need to dawdle
    // (because the length of a token is generally known)
    if (token_length != strlen(input)) {
        return false;
    }
    
    // No need to worry about mismatched buffer lengths at this point
    for(int i=0; i<token_length; i++) {
        if (token[i] != input[i]) {
            return false;
        }
    }
    return true;
And all of a sudden the timing attack is back.

You cannot write a constant-time function in portable C / C++, and trying to do so is a fool's game.


I was willing to concede your point, but I tested first. Using clang to compile, I did some profiling on my function optimized to various levels (-O0...-O3, -Os, -Ofast) with various inputs. The variation in timing is noise (I get the same variation on the same test inputs in different runs of the program.) I'd be curious to know whether you can provide a combination of compiler and compiler options that do indeed produce noticeable variations in timing as a function of the inputs.

"You cannot write a constant-time function in portable C / C++..." perhaps you mean that we can't outsmart compiler optimizations. Without optimizations ("Debug" configuration), the code runs exactly as written.

Ultimately, I'm going to prefer code with obvious intent regardless of how the compiler will optimize the thing. We'll just have to find a way around this timing attack business.


I would be very surprised if a compiler on a modern high-performance processor actually did this optimization - as deep pipelines and speculative execution mean that the additional conditional branch per loop may hurt more than it helps.

That being said, on a simple architecture - namely a shallow pipeline - this sort of thing is an "obvious" optimization.

And w.r.t. "without optimizations" - there is no such thing at the C / C++ level. There are computer architectures where optimizations pretty much have to be done, for instance. (Many architectures with compile-time scheduling, for instance.)

For instance, if you have a dataflow machine it may end up running this in non-constant time. And I'll note that modern x86 processors are looking more and more like dataflow machines.


Why can't you just set a bit for each non-match and then bitwise-or them all together, returning false if non-zero? I don't think a compiler can optimize that away.


It's the exact same optimization either way.

One is just relying on adding a positive number never makes a number > 0 0 again. The other is just relying on oring a bit to a non-zero number never makes it zero again.


I don't understand. I thought the timing attack was because the logic short circuits when it finds a non-match. How does the compiler short circuit the bitwise-or?

At first I was thinking you could write the bits out to a separate word, but I think this suffices:

  uint result = 0;
  for (i = 0; i < length; i++)
    result |= notmatch (i);
  return result == 0;
You would need value range analysis for the use of result in addition to the insertion of an extra test and branch to short circuit the loop, plus a model of what |= does to values. Do any compilers actually do this?

But, actually I'm kind of confused about your first point. What optimizations did your compiler use to eliminate the match_count++ and hoist the return into the loop?


As I've said elsewhere, current compilers are unlikely to do so. Current processors have too deep a pipeline to make it worth it.

But on an architecture with a short pipeline it can be worth it for a compiler to do so.

And it's simple, in theory if nothing else. Note that once result has a bit set, result cannot ever return to zero. Hence, if result has a bit set, one can return false immediately. And all of a sudden you lose the constant-time part.

As for how a current compiler could do so? Look at what happens if/when the loop is unrolled.

That current compilers don't tend to do this sort of optimization makes it only more dangerous - because people do not go back and re-examine old code nearly as often as they should. Much less check the assembly every time they update the compiler.


Let's assume you're right. Why can't you choose characters to compare randomly, until you've compared them all?


You can still do a timing attack against that. Suppose the password is "password". You do a timing attack to find the correct length, then you start trying "aaaaaaaa", "bbbbbbbb", "cccccccc", ... The ones that take less time on average are the ones that contain a letter from the password. You can go from there. Ever played mastermind?

It's a mitigation but it doesn't prevent the attack.

Not to mention: how exactly will you check that you've compared them all?


Ah, mastermind. I think you mean the ones that take more time on average contain a letter from the password, not less. Anyway, good point.

I was thinking you could maintain an array of flags to indicate whether you've compared a certain position before, and a count of all the compared positions so far.

Alright, here are my other ideas:

1) Properly chosen 8-character passwords are pretty strong, right? So why not copy all of the 8-bit chars into a u64 and compare that directly? You can treat longer passwords as a series of 8-char passwords. Assumes a machine that won't short circuit on u64_a == u64_b. Less effective for 32-bit. The compiler won't undo this since it's an optimization (1x aligned 64-bit compare is more efficient than 8x 8-bit compares, seven of which are unaligned).

2) Introduce a random delay after the byte-wise comparison is done that is up to 10x the length of the comparison. The comparison variance gets lost in delay variance. I know, a mitigation, but it's effective. Combine with random selection of characters for more effectiveness.

3) Use a perfect hash of the password. You don't need to compare keys after a perfect hash.

Thanks for humoring me.


Whoops, yep, that is incorrect. Should be more time, as you said.

1) That causes massive problems for people (like me) who use passphrases. (I use diceware passwords for anything I don't use a password manager for. So things like "corn umpire khaki dow heave hiatt sis steal". That's ~103 bits of entropy (massive overkill for most things), and yet is a whole lot easier to remember than, say, "FlyaJdqJW6kvyUQeE" - which is ~101 bits of entropy (17 random upper/lower/digit characters).

It also assumes that the machine doesn't short-circuit, as you say.

And, again, you're assuming that the compiler doesn't undo it. Just because it won't be undone by optimizations on current machines doesn't mean it won't be undone by optimizations on future ones. On a RISC architecture, for instance, a 64-bit compare may not even exist - it may be implemented by 8-bit compares. Or 16, or 32. Whatever.

2) Ever heard of the german tank problem? That will not drown out the comparison, not at all. I can defeat your example (10x noise as comparison time) with >90% accuracy with 100 samples. (Just take samples, and take the mean of the sample minimum and maximum.)

3) Without leaking other people's passwords every time you generate a password for someone?

I find this sort of conversation intriguing. I want things to be more secure, I just don't know of a good way to do so.


A perfect hash is a function in the mathematical sense in that there is a unique output for each input. So, hash the stored password ahead of time, and hash the challenge password after receiving it, and then compare the two hashes. You don't need collision detection, since a match can only come from two identical keys. You don't generate passwords for people, although you could, as long as you don't reveal your hash function. There are useless perfect hashes, like "password + 1", and there are much better ones that produce near-random output.


I know what a perfect hash is.

But a) pigeonhole principle (you need an output at least as long as the longest string you'll encounter), and b) I was under the impression that perfect hash schemes require you to know the set of strings encountered ahead-of-time.

Could you give me an example of a perfect hash function that is suitable and secure enough for password use? I do not know of any.


You're right, there's a hard limit on password length. That might be fatal if you're using diceware, I don't know if you could accommodate passwords that big. As for the set of strings, just assume it's every string possible, so 256^8 strings for an 8 byte password. That's only 16384 petabytes.

All of the perfect hash functions I've found require knowing all of the keys ahead of time. You might not be able to fit 16384 PB on your hard drive or in memory. (Hey, I don't know. You could work at LLNL.) But I think that if you knew you might use any key in the space, and you didn't insist on a minimal perfect hash, i.e. one where there is a 1:1 mapping from hashes back to keys, that you could write such a function without having all of the keys.

If you did this, you'd also need to prove you were generating a unique and effectively random hash. If I was a crypto academic, that might be an interesting line of research. I'd be kind of surprised if nobody ever tried this though, and there's probably a decent amount of literature to pore over. I guess most people use perfect hashes for hashtables and not as a defense against timing attacks.


Congratulations. You've just described the standard method of password hashing. (Well, you've missed the common extensions of salt / pepper, but meh.)

If you pick a cryptographically secure hash that's long enough, you don't need to worry about collisions. The birthday problem dictates that you start getting collisions after ~sqrt(D) random entries. So if you pick a 128+ bit hash you should be fine on that front (128 bits -> ~2^64 entries before a collision on average, which shouldn't be a problem.)

However, as I've mentioned elsewhere, all this does is punt the constant-time problem off to the hash function.


Wow, maybe I have a future in cryptography!

Anyway, why exactly isn't the hash function constant-time? I don't understand this, the hashes I've played with for hashtables are just a bunch of bit shifts. Is it only message length?


Maybe you do...

The hashes generally used for things like hash tables don't need to be cryptographically secure, and as such can be substantially simpler. When you start getting into cryptographically secure hashes things get complex enough that you start ending up with timing variations if you're not very careful. Especially once you start talking about table lookups / etc (although that's more common with encryption algorithms than straight hash algorithms).

And the compiler - with C and C++ at least - is free to optimize in ways that introduce timing variations. Which is what sparked this conversation in the first place. So even if your source code doesn't take data-dependent time, the compiler may make the compiled output take data-dependent time. And there's no way around that in pure C / C++.


Yes you can - take the hash of the input and token, and compare those. Now you're trying to cause a sha256 collision. Good luck!


You still have to perform the byte-by-byte comparison and this is where TheLoneWolfling alleges the optimizations kick in and give away secrets. However, from my testing, the variations in timing are unpredictable (I get variations on compare speed on different runs of the same program with the same input) and small and would be swallowed up by variations in network latency.


You could probably do that relatively safely with bit operations, but I expect to be proved wrong.

Something like this should work:

    /* Swap char for uint<register_size> for speed */
    #define CMP_TYPE char

    /* Assuming char* sha256(char* input); */
    CMP_TYPE* input_hash=(CMP_TYPE*) sha256(input);
    CMP_TYPE* token_hash=(CMP_TYPE*) sha256(token); /* Precomputed? */
    /* max_pos=32 for char on most systems */
    int max_pos=256 >> (3 + sizeof(CMP_TYPE));

    char cmp=0;
    for (int i=0;i<max_pos;i++) {
        cmp = cmp | (token_hash[i] ^ input_hash[i]);
    }
    return (cmp != 0);
The reason for the CMP_TYPE #define is so that you can optimise the comparison by replacing char with uint32 or uint64.


Again, the compiler is free to optimize that to data-dependent time. It's relatively unlikely to do so on current machines (the time saved is unlikely to be worth the potential of a pipeline stall), but it's free to do so.

For instance, it's legal for the compiler to insert a strcmp fallthrough (`if input == token: return true`, or rather the strcmp equivalent) at the start, I'm pretty sure.


I have a bunch of other replies in this thread that touch on this, but suffice to say it's not just the comparison that matters.

For one thing, you have to assume that the SHA function is data-independent time (which, again, good luck doing in C / C++).

For another thing, noise in timing attacks doesn't prevent them. Even at levels of noise that seemingly obscure everything. And it's a very bad thing to rely on network latency being unpredictable enough.


Since an optimizing compiler is free to go to town on functions such as string comparison, are these functions typically written in assembler per-architecture? I.e. one for x86 and x86-64, another for MIPS, RISC, etc?


Except that that still leaves you open to a dictionary attack.

Or rather, it accelerates a dictionary attack from O(sizeof(dictionary)) tries to O(length(hash)) tries.

(Let's suppose you are comparing by hex digits. You try 16 dictionary "words" whose hashes start with 0...f. One of those should take slightly longer. You then try 16 dictionary "words" whose hashes start with the digit found previously with different second digits. (Skip any digits that you don't have a word for.) One of those should take slightly longer... Repeat until found.)

This can be mitigated with a salt - if and only if you manage to prevent the user from figuring out how your salting scheme works.


a) You're just punting the timing attack off to the SHA algorithm.

b) There is no SHA algorithm in the C / C++ standard library (that I know of), muchless a guaranteed constant-time (or rather, data-independent) one.

c) The compiler is well within its rights to insert "busy_loop_for_ms(input_char);" anywhere it wishes. It is unlikely to do so, but it is allowed to. As I said: C and C++ don't have any notion of constant or variable time.


What if you mark token, input, and match_count as volatile?


In practice, it'd probably work.

In theory? The compiler is still free to add data-dependent delays.

For an "evil" example, a compiler could do this:

    ...

    for(int i=0; i<token_length; i++) {
        char temp = input[i];
        if (token[i] == temp) {
            match_count++;
        }
        wait_for_ns(temp);
    }
It wouldn't make sense for a compiler to do so - but it is allowed. It


That's a good point. However, even if you wrote the routine in assembly, the CPU could pull a similar trick. At some point you have to give up and trust the tool not to do something too catastrophically stupid. Whether that point is at the CPU or the C compiler (with sufficiently paranoid C code) or elsewhere, I'm not sure.


Yep. A classic example is multiplication - which isn't constant-time on modern machines but was on many older ones.

Currently, CMOV is looking like it could become the same thing. It currently takes data-independent time. But is not guaranteed to do so, and very well may not in the future.

The distinction is that hardware tends to change slower than software. You generally have multiple compiler versions per hardware version (possible exception of embedded processors, but even then...).


In golang, you wouldn't need to write a for loop, you just compare the strings directly.

if userToken != token {

    return false
}

return true

The code is simpler, and the edge case is covered.


What do you think this does under the hood? You're basically saying, "why don't you use a correct string comparison function instead of a broken one?" which isn't a particularly valuable insight and has nothing to do with Go.

Also, unless you're familiar with how Go implements string comparison you don't know if it's exploitable in other ways like timing attacks.

Valve's failure was in testing, not language choice.

Also, why not just return userToken == token?


Oh right you could just do a 1 liner.

Expecting the go builtin functions to be correct is a pretty safe bet. Used by thousands of people and built to stop problems like buffer overflows, vs something you wrote yourself.

Using a popular library in a language with unsafe builtin types would achieve a similar effect, as long as the library was code reviewed, etc, but the point is that safer languages don't require you to take that extra step. Use them idiomatically and entire classes of exploits will not be possible through your code.

As for timing attacks, if you are making cryptographically sensitive code you really should be using a crypto library and not anything you made yourself.


What about timing attacks?


There's plenty that can be done about it, but it imposes a cost that most software shops don't want to pay today, despite the enormous costs of these breaches.

Formal code reviews. Longer pipelines between checkin and deployment. Penetration testing during pre-release. Better QA, both manual and automated. Better security policies in general. And so on.

Making a change to security related code should trigger a process that represents and avalanche of attention. There should be a shit-ton of scrutiny on any such change. The fact that something this simple wasn't caught is indicative not that "nothing can be done" but rather that most security related code today is handled in an amateurish way by teams who barely care about the consequences of their actions.


In this particular case, the advice to avoid it is more simple.

When designing test cases choose the extreme values, no text, too much text, etc.


This is precisely the sort of error that Fuzz testing should have uncovered.


When you have such a huge application (steam) how can you afford to Fuzz everything? Also to setup fuzzing for every parts of your applications


They were thinking about how to make it work instead of how to make it break. Classic mistake (e.g. [1]).

1: https://www.youtube.com/watch?v=vKA4w2O61Xo


Watching that, it amazes me how many people don't get that. Not long ago there was an article on HN about this very experiment (from NY Times, IIRC).

Now I can imagine, if you're not a software-tester by heart, that you'd maybe try to "win" the first or two tries, out of habit. I'm not a professional tester, just a coder, and reading the aforementioned article, I also first tried 3/6/12 and then some arbitrary increasing sequence, before catching myself and realizing if I want to figure out the rule I need some negative examples.

Is this something you need to be a programmer to realize?

Or maybe you need to have a "scientific" mind or something?

I'm going to try this on some of my friends, see if I can figure it out.

I expect a big part of it to be some psychological barrier against "failing" or getting "no" as an answer, especially if you're being put on the spot about some math-related puzzle. Many people feel uncomfortable with maths and perhaps fear giving a "wrong" answer and appearing "stupid".

But "fear of maths" isn't what I'd be testing for, it's "willingness/realization to try an experiment with an (expected) negative outcome".

So I'll make sure to formulate the problem in an appropriate manner, just like the guy in the video did, that the goal is to figure out the rule, not to finish the sequence. Try to make them comfortable, but not as far as saying "It's okay to ask me about a sequence that does not follow my rule", because that would obviously bias the experiment.

Though I wonder now, there's some really interesting studies to be done (probably many already have been done), what if the puzzle is about some other rule for a sequence of three, that doesn't have anything to do with numbers?

Say the rule is "objects of increasing size", but the (similarly misleading) example is bicycle / car / train.


I would think the phenomenon is basically equivalent to confirmation bias. A hypothesis is formed immediately upon seeing the question. Then, only further information which confirms their hypothesis is tested.

Testing the negative is something that should be fairly routine in a scientific-testing mindset. For instance, medical experiments testing both placebos (negative) and medicine (positive) and looking for differences.


I think it's an implicit assumption that a big complicated rule will be very picky in what it accepts, meaning hits are rare and will reveal more information than misses. They think all the complications are on the input side, so they're trying to pick interesting inputs instead of interesting outputs.

Come to think of it, I might be making this mistake but for non-boolean functions. I try to always have tests that pass in interesting values, but maybe I should consciously also try to have tests that get back interesting values. For example, when testing a classifier I should make sure all the classifications are possible. Or, when testing some mathematical function implementation, I should have tests with inputs that cause the result to be exactly -1, 0, 1, and 2 (if applicable).


Writing such an exploitable code is entirely forgivable. However, I would expect such a critical module to be thoroughly unit tested. I am going to venture that it probably was not the case.


Not exactly sure how forgivable this is. But yes, unit testing and writing a few negative scenarios would have caught this problem early on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: