Hacker News

mountainriver · on Feb 2, 2023

Static typing is not just about code performance, it’s absolutely about correctness as well. I would really love to see that study as it flys in the face of all my experience

ReflectedImage · on Feb 2, 2023

https://games.greggman.com/game/dynamic-typing-static-typing...

lostmsu · on Feb 2, 2023

> has on average 2.5x the number of bugs

This claim is not supported by the linked article.

In fact, the main claim (only 2% of bugs are type errors) of the linked article also does not make any sense. It is based on the assumption, that typing errors only ever cause TypeError, AttributeError, or NameError in Python, which is, ironically, false, because Python is a dynamic language.

To give a concrete example, I went out to GitHub, got into top Python repository (TensorFlow), and searched for a first closed pull request which explicitly mentioned ValueError and did not mention either of the above. It turned out to be this one: https://github.com/tensorflow/tensorflow/pull/47017 This issue is an obvious type error: in statically-typed languages the value of float can not be nil.

You could probably repeat that experiment with other closed pull requests, but I can bet you you will find that the rate will be closer to 50% if not 90%.

ReflectedImage · on Feb 2, 2023

The article says that dynamically typed code and statically typed code has a similar number of bugs per line.

It also says that a software feature only needs 1/2.5 lines if you are using dynamic typing.

That means dynamic typing reduces bugs by 2.5x compared to static typing per software feature.

lostmsu · on Feb 2, 2023

None of these are anywhere in the linked article: "2.5" "similar"

ReflectedImage · on Feb 2, 2023

There is more than one study on the topic. But you can just take a look at the graphs.

Take a look at the "Programming effort" graph and the "Program length" graph.

ReflectedImage · on Feb 2, 2023

That not even an error, it's an usability improvement.

lostmsu · on Feb 2, 2023

Firstly, usability bugs like this are still bugs.

Secondly, think how that issue surfaced in the first place. Somebody used TensorFlow in their Python code, and their Python code produced unexpected result and/or crashed. Judging by the fix, it is highly unlikely to have produced one of the errors from the above list. As a ML practitioner I can also tell you it likely means somebody had to spend a significant amount of time to get to the cause once they saw the output, because numerical miscalculations are very hard to debug in the first place.

ReflectedImage · on Feb 2, 2023

That not a bug that static typing would catch as it is not a type bug.

lostmsu · on Feb 2, 2023

What do you mean it is not a type bug? You've seen the fix with your own eyes. In the static language that fix would never have been needed, because the consumer would never have made the mistake that caused the miscalculation in the first place. Of course it is a type bug!

This is along the lines of

  def is_valid_email(email): return str(email).contains('@')
  
  class Person:
    ...
    def __str__(self): return $"{self.name} <{self.email}>"
  
  print(is_valid_email(Person(name='Peter @ Work', email=None)))

ReflectedImage · on Feb 2, 2023

You can't get a NaN from a None value in Python. (Duck typing still has rules)

The error you have posted isn't a Python error.

Looking a bit deeper, the error occurs in C++ code. C++ is a statically typed language, last time I checked.

zelphirkalt · on Feb 2, 2023

Not that I am a huge static or dynamic typing fan, but I think language with very strict type system disagree. Like for example Haskell or Rust, where when it compiles, it usually works, unless you introduced a logic mistake. Python with its typing might potentially never get there, but that is not the point.

Toniglandyl · on Feb 1, 2023

Do you have a reference to as study that support your point ?

jchw · on Feb 2, 2023

This is nearly impossible to precisely study:

- Because correlation != causation, for one thing: 2.5x the number of bugs in static "style" code bases (not sure what that means) does not mean that static type systems lead to more bugs. It could be that users of dynamically typed languages find themselves needing robust automated tests more often, because they can't rely on static typings. This would suggest that in a similarly productive language, your effort would go much further with types than without, even if this measurement is accurate.

- Because measuring bugs is hard: you don't actually know how many bugs a given piece of code contains, you only know how many you found. Statically typed code contains more information about the intent of the code. Not only does this make it easier for machines to analyze, it makes it easier for humans to analyze, too. Ask anyone who does security audits whether they prefer code that has strict static type information or not.

- Because not all bugs are the same; a reputable study would need to go through great pains to classify the different kinds of bugs and determine their severity as well as determine why they happened. Counting bugs as just a number and then comparing them is like counting SLOC. It doesn't tell you anything on it's own!

> Static typing has nothing to do with code correctness, it's purely about code performance.

Uhhh... Says who? Type erasure systems offer no performance benefits at all, since the types are removed before runtime.

There are contradicting studies, that suggest static typing prevents bugs, but there's little point in trying to fight them. None of the studies I've seen feel robust or complete enough, and on top of that, what would really be helpful is a meta-analysis of multiple more robust studies so we can get an idea of the true takeaways.

Until then, there will be some subjectivity.

There's no nice way to say this: I don't trust people who treat issues like this as black and white. To me, it's a sign of immaturity as an engineer. I do of course believe that static type checking is a net win overall and that the evidence to the contrary likely has other factors (like, for example, relying on relatively primitive type checkers that catch less errors, require more handholding, and generally have trouble handling idiomatic code) but like I said, YMMV: everyone has their failure and success stories. To me, if I can find a bug before I even save the file, versus needing to wait until QA catches it, or worse, it lights prod on fire, it's probably a win. As far as this "static code style" business goes, I don't understand it. My code does not look that different when it's JS vs TS. For Python, MyPy still doesn't seem sufficient, so I would not be surprised if strictly using things MyPy supports would make code worse. For us, switching to Go had more benefits than JUST types; of course we wound up having better performance and memory usage too, and the error handling practices, while verbose, definitely put error handling and edge cases front and center.

ReflectedImage · on Feb 2, 2023

[flagged]

jchw · on Feb 2, 2023

> Yeah but typing related bugs which is what static type checkers catch only account for 3% of bugs.

You're stating this as if it's fact and also universal, but I strongly doubt it. Even defining "type-related bug" is hard, because there's a ton of bugs that are not type-related but are much easier to avoid if you're using a good static type checker. For example, stringly-typed values, or switch statement exhaustiveness bugs. Some type checkers, including TypeScript, can eliminate almost all null reference errors when used with strict settings. Then to top it off, types add a bunch of static analysis capabilities you could not otherwise have: for example, you can find accidental dead code because you have an impossible condition that can be proven impossible by types, or more exactly find incorrect usages of library functions, and so forth. There's quite a range of problems and almost none of them are explicitly related to strong typing.

And let's say somehow, this is a universally correct number. That doesn't mean it's the reality faced in front of you. What if you already know the majority of your bugs, as logged today, could be prevented by a type checker?

> You are always going to get more bang for buck by simply writing more unit tests.

Good test coverage is a huge, non-trivial investment for any reasonably large, reasonably complicated codebase. Making all of your code testable in fact impacts how you write code, sometimes in ways that are actually similar to the differences you might make to account for static type systems too.

That said, you should do it! You should do both. Tests catch bugs that type checkers can't, but type checkers offer more than just that, and they catch them faster; typically before you hit save these days.

> Static typing is done primarily for performance and that requires a statically typed language.

I dunno why you're repeating this, people are objectively using static typing for other things. That makes your statement wrong on it's face, because what you're saying is not a matter of opinion, you're suggesting it's "primarily done for performance" and it just simply isn't. Given that TypeScript is one of the most popular programming languages right now, I'd argue this "primarily" categorization is just wrong.

But I think it goes deeper. Like, why does C do static typing? It does not actually need to much. In fact, in an early version of C, the type system was significantly weaker. Struct fields were global: you could access any struct field on any pointer. This is convenient and it doesn't disallow any correct code, but alas nobody is wishing for that to make a return.

I won't even bother getting into functional programming languages, but they exploit types far harder than any of the languages we've been discussing; it's just simply nothing to do with performance.

Maybe C has been trending in the "safer" direction for performance? No. Stakeholders have clearly been improving types specifically for the purpose of preventing classes of bugs observed in the real world. Literally just recently, a great article dropped about array types in Linux, for example:

https://people.kernel.org/kees/bounded-flexible-arrays-in-c

C++ and Rust use types for a lot of things. Some of them are performance, but it's not that simple either. Rust lifetimes are definitely about fast correctness, but it absolutely enforced "correctness" that you would not get with a global interpreter lock, because the principles of never having multiple mutable aliases simply makes sense given the machine model we have today.

So then TypeScript. Let's assume in aggregate it somehow prevents absolutely no bugs despite the obvious fact that it prevents entire classes of bugs that are common in idiomatic JS. Well, guess what? It still offers a ton. For example, it makes refactoring significantly easier. It's not even a discussion, refactoring old code is much easier when it's statically typed, even if it's ossified and weird and full of Chesterton fences. It's night and day. Typescript also offers excellent code intelligence: very accurate autocomplete, inline documentation, automated refactoring options, and yes, more static analysis options than just type checking; being able to statically analyze the code deeper opens up an explosion of linters and checkers for all sorts of things; checking for improper usage of React hooks, or common lodash mistakes.

> Hacking it into a scripting language is plain looking for trouble.

On the contrary, I wouldn't catch myself writing JS without TypeScript anymore. I consider the investment very small compared to the benefit.

In the real world, I often do not get the chance to start new projects from scratch, but instead wind up walking into existing messes. The last JS project I walked into wound up having thousands of errors per day that could've been prevented by static type checking, caused by hundreds of different bugs. And that's just what we could observe by adding metrics: there were still more bugs that were uncommon enough to not be caught initially. When you're dealing with an old ossified codebase like that, you sometimes wind up with more bugs going in than out a lot of the time. The only reliable way I've seen to reverse that trend is by incrementally adding more static analysis. Hope, after all, is not a strategy.

I don't believe there is an objective truth here, but I have strong doubts about the points presented here. The argument for static type checking is strong: it simply prevents classes of bugs, end of story. That's a very simple story and does not require any gymnastics to explain. The argument against static type checking feels very hand-wave-y at best. The fact that we keep landing on this weird point about how it is "primarily" used for performance especially, since I don't think that's true. I think you are mistaking the fact that dynamically typed languages are slow due to the inability of the compiler to optimize for statically typed languages choosing to be static simply for performance. I don't think it's even true for C.

ReflectedImage · on Feb 2, 2023

3% is the right order of magnitude and if it is +10% more than that that wouldn't make any difference. Static typing would still be twice as bad as dynamic typing when it comes to code correctness.

"What if you already know the majority of your bugs, as logged today, could be prevented by a type checker?"

You would be a very bad programmer, since the vast majority of bugs are logic bugs.

"why does C do static typing" Code performance. It needs them to effective map types to registers avaliable on the machine it's been compiled for.

Rust lifetimes are about getting C like performance without the bugs introduced by using the techniques required for C like performance. You could remove Rust lifetimes and get all the same safety, it's just the programs would run slower due to garbage collection.

"So then TypeScript. Let's assume " TypeScript is an obvious bad example since JavaScript is a very bad language since it was hacked together by browser makers over several decades. You could make a dynamic typed equivalent of TypeScript and it would have the same advantages.

"The argument for static type checking is strong" It's incredibily weak it prevents a class of bugs that only occur very rarely in dynamically typed languages (3%). By both increasing your development time by 2.5x and increasing the total number of bugs in your program by 2.5x.

You wanting to believe in static typing doesn't stop it sucking in the context of scripting languages.

jchw · on Feb 2, 2023

> You would be a very bad programmer, since the vast majority of bugs are logic bugs.

I don't work alone. Most of the time, I work on projects that predate my involvement, and are old enough to have accumulated cruft.

But I think I found the disconnect. This right here is the key:

> since the vast majority of bugs are logic bugs

You might think that there is no way a type checker could prevent a logic bug. You're wrong. It is clear to me now that when you say 3%, you are talking not about the type of bugs that static type checkers are capable of preventing, but only errors that are explicitly related to types. Because if you model your code with types as you go, types do prevent all kinds of logic errors. When you write code with strong typing, it can prevent impossible state machine transitions, ensure that you handle all cases of a set of possibilities, ensure that you follow API contract obligations, and more.

Exactly how much a given type system is capable of inferring and decoding varies. TypeScript is definitely the state of the art for type erasure systems built on script languages, and what it is capable of encoding is pretty impressive.

Maybe, maybe, the amount of bugs that you can attribute to typing issues is 3% if we're only regarding Python TypeErrors; but, a proper type checker goes far beyond that.

ReflectedImage · on Feb 2, 2023

"a proper type checker goes far beyond that"

But if it can't catch 60% of all errors and it can't. Then it's worse than dynamic typing.

This isn't complicated. If you are excluding performance, then static typing is inferior to dynamic typing. It's that simple. No ifs, no buts.

I do write statically typed code but that's for performance critical code. Since I've got good experience of both, I understand the differences very well.

I assure you that using static typing in a scripting language like Python is a mindnumbingly stupid thing to do.