Selecting a programming language can be a form of premature optimization

brabel · on Nov 15, 2021

This blog post assumes Python is more productive than other languages. People like to make this claim about not only Python, but many other dynamic programming languages, specially Ruby, PHP and Lisp... but there's very little evidence to support that... programmers tend to be most productive in whatever language they know best. If they know both Python and Java equally well, I would bet they would be almost exactly as productive in either.

rich_sasha · on Nov 15, 2021

I used to work at a C++ for prod, Python for prototype place. Time and time again, a 1-2kloc Python prototype, the bulk of which was written in a week, would take months to translate to C++, at times longer than the whole process of research+prototype.

I think it was 2 things: one, it is genuinely harder to write a valid C++ program. The language imposes non-trivial constraints on the programmer, for better and worse, and you have to navigate them for the thing to even compile. But two, C++ invites pedantry. I heard endless discussions about "ooh, template meta-programming", "struct or class", "but is this idempotent", "you should be using move semantics", "why on earth is this a unique_ptr".

I'm not saying C++ is worse, or that its trade-offs are not worth it, but for sure, from my experience, translating an algorithm into code is done far more quickly in Python than in C++. YMMV

Ah and also, this was definitely not a question of deficient C++ coders. The hiring standards for C++ were so high we were always short of devs. Meanwhile, Python prototypes were usually written by part-time ex-academia Python dabblers.

gpderetta · on Nov 15, 2021

I think a large part of the difference is the prototype vs prod. I have written prototypes in C++ in a few weeks that still took months to go to production. It is just that production-ready code require significant more ceremony, more config files, more complex error handling, code reviews and of course feature creep.

I still agree that Python can be much more efficient to write than C++. I just wish it wasn't so unbearably slow.

rich_sasha · on Nov 15, 2021

I would mostly agree, though not entirely. Continuing with the anecdata, at times I had enough waiting and put my prototype into prod, with significant hardening. I can't say I always got it right, but usually 3-5 days of very focused poking made the thing at least reliably run at the desired cadence, with rare but loud and graceful failures. For sure though, there is a tax for doing it all "properly".

Even then though, I have the impression that Python just makes it easier to write it all more quickly. It just doesn't invite you (as much) down rabbit holes of doing things even more properly.

professoretc · on Nov 15, 2021

> But two, C++ invites pedantry. I heard endless discussions about "ooh, template meta-programming", "struct or class", "but is this idempotent", "you should be using move semantics", "why on earth is this a unique_ptr".

Interestingly, I've consistently had the opposite experience. In C++, when I ask, "how do I do X" there will be a few different options, but pretty much any of them will do. As long as you avoid things that are definitely UB, you're fine. In Python, I'll find endless religious arguments about which is the most "Pythonic" method. Whenever I try to just hack out Python code to just get things done, I always feel like I'm being judged, because usually the quick-and-dirty method is far from the most Pythonic.

twobitshifter · on Nov 15, 2021

I think it all depends on who does your code reviews. I agree with GP that c++ has religious adherents to the one-true-undocumented-in-their-head standard who will make writing it a nightmare. In python the philosophy is supposed to be - there is an obvious way to do things and that’s the right way. However this does fail from time to time.

Ruby on the other hand embraces many different syntaxes to do the same thing, but they’re all accepted as correct in the traditional view.

kitd · on Nov 15, 2021

But two, C++ invites pedantry. I heard endless discussions about "ooh, template meta-programming", "struct or class", "but is this idempotent", "you should be using move semantics", "why on earth is this a unique_ptr".

I wonder what the result would have been in eg. Go or Rust, where that level of conversation doesn't exist. True, others would have taken place instead, but at a higher level nearer the design, where they should have been happening anyway.

munificent · on Nov 15, 2021

> Time and time again, a 1-2kloc Python prototype, the bulk of which was written in a week, would take months to translate to C++, at times longer than the whole process of research+prototype.

I've heard reports like this for many years.

My pet theory is that the discrepancy here is ~10% static types and ~90% memory management. Having a GC (or refcounting or whatever) fully baked into the language so that you don't have to think at all about how values are passed around and stored is a monumental productivity boost. Possibly the biggest win in software engineering productivity in the history of the field.

ptx · on Nov 15, 2021

Absolutely. I recently translated Microsoft's example code for using the Windows crypto API from C++ to Python, and huge chunks of the code melted away as it was mainly concerned with freeing buffers. Exceptions also helped.

tcbasche · on Nov 15, 2021

> Python prototypes were usually written by part-time ex-academia Python dabblers

This seems like a monumental level of operational waste. I'd love to hear more about this particular setup

rich_sasha · on Nov 15, 2021

Well, part-time programmers. Full-time employees, doing mostly semi-academic stuff.

andai · on Nov 15, 2021

There should be one-- and preferably only one --obvious way to do it.

-PEP 20 -- The Zen of Python

uranusjr · on Nov 15, 2021

Although that way may not be obvious at first unless you're Dutch.

—The next line

This ZoP line is by far the most misused cliché of all things Python. How is it determine that “prototyping in C++ and rewriting in C++” is not the obvious “one” way? Because you don’t like it? A Zen is intentionally self contradicting to allow introspection, not judge others :)

michaelcampbell · on Nov 15, 2021

Hold my beer.

-- Django

bierjunge · on Nov 15, 2021

Amateurs.

-- Perl

jorgeleo · on Nov 15, 2021

Slow kids

-- assembler

anamax · on Nov 17, 2021

Pot, kettle, black.

-- fpga

Back at you.

-hdl

xscott · on Nov 15, 2021

I wonder how your C++ would've gone if you had a C++ library of Python-like data types. Something like a 'variant' type that could hold arbitrary integers, strings, and dicts or lists of those (or anything else you commonly used).

You would initially trade some of the C++ performance, but the code shouldn't be more complicated or difficult to write or compile than Python. Then, you'd only need to change the performance sensitive parts to use real C++ arrays or structs.

typon · on Nov 15, 2021

This is something I do except the slow parts are in Python and the fast parts are in C++ with pybdind11 connecting the two worlds

Kye · on Nov 15, 2021

I thought the C++ tendency was to optimize performance sensitive parts with C or assembly.

taylodl · on Nov 15, 2021

C++ is C. We used to optimize with assembly but nowadays with a processor's superpipelined architecture and the compiler's global register optimizations and so forth your inlined assembly is more likely to throw a monkey wrench into the whole works and make everything run slower. It's a better use of your time to look for a better algorithm.

khoobid_shoma · on Nov 15, 2021

Not really. C++ is not C, nor faster than C. e.g. TCO in C++ is awful due to ctor/dtor (could be mitigated by some compiler hints if available).

xapata · on Nov 15, 2021

Isn't that what Python is? Except C and not C++.

xscott · on Nov 15, 2021

No, because you wouldn't have to pay the virtual machine overhead (among others). Another possibility would be using Cython - write the prototype in pure Python, then Cythonize the parts which need to run more quickly.

xapata · on Nov 16, 2021

There are some new competitors for Cython. The article mentions Numba, which I've been pleased with, and mypyc, which I haven't tried yet.

taylodl · on Nov 15, 2021

I've been programming for nearly 40 years now and I've used all the popular languages in that time (popular for microcomputers) and I've mastered a few. I just started Python for the first time last year. Am I a Python expert? No. But I can tell you right now I can get a project done faster in Python than using any language in which I'm an expert. It's that productive.

Does that mean Python has all the performance you would ever need? No - we know it doesn't, but we also know that you rarely need that kind of performance and when you do it's usually localized to a very narrow portion of your code. Take that portion of code and implement in C and optimize to your heart's content. Even with that you'll still get your project done much faster.

Python is the tool of choice for those developers who just want to get stuff done.

lvass · on Nov 15, 2021

I've used Python for 15 years and I'm far more productive with Clojure. Don't generalize.

taylodl · on Nov 15, 2021

I agree with you on Clojure, but I'm not going to be able to get my team to adopt Clojure - unless you're going to tell me how you convinced your team to adopt Clojure! :)

wallscratch · on Nov 15, 2021

What features of clojure do you feel make you more productive?

I’m primarily a python user, but I spent time writing code in Ocaml to understand the potential benefits, but… (I would love to have my mind changed)… it feels like many of the features people touted about Ocaml have already made their way into python…?

1. Immutability / referential transparency seem more like nice-to-haves for a codebase, rather than the real reason people tout FP…?

2. Sum/product types and pattern matching are being added soon to python

3. Mypy is starting to gradually enable python typing, although I assume it’s still a work in progress

4. Python allows us to use map/reduce, and to pass functions around as arguments…?

I want to understand the potential benefits of FP more, but my experience with Ocaml hasn’t shown me great improvements yet. Open to having my mind changed.

antifa · on Nov 16, 2021

In Ocaml, these things are all first class citizens and well integrated features.

In python, most of these "features" are done half right and duct-taped on at the last second as an afterthought.

ubercore · on Nov 15, 2021

Presenting a specific experience refuting an earlier generalization is reasonable here, I think.

shadowfox · on Nov 15, 2021

I mean it is not just mentioning a specific experience to refute a generalization when there is also this: "Python is the tool of choice for those developers who just want to get stuff done."

taylodl · on Nov 15, 2021

Admittedly that last bit was a bit inflammatory, but I've gotten tired of the constant arguing over languages at where I work. I'm on a bit of a rogue team that's just adopted Python and are getting stuff done. Other people are just now starting to pay attention because we're getting so much done and now work is headed our way as a result. Wait until they find out we're not using any of the languages they're bickering about and are using lowly Python! Still, as awesome as my team is I think they'd draw the line at Clojure.

alexeiz · on Nov 16, 2021

I love Clojure. As with Python, Clojure comes with batteries included. It's dynamic typing and REPL makes development really quick and the JVM runtime allows to achieve better performance than Python in many cases. You can even go as far as compiling to native executable with GraalVM.

moksly · on Nov 15, 2021

I spent 7 years working with C#, I like C# and I think what they’ve done in the past few years is amazing.

I’m much more productive in Python. Not as in “I feel” more productive, but as in measurably more productive.

Where Python falls short of something like .Net and C# is that I probably wouldn’t have been more productive in Python if I didn’t have 7 years experience with a rather strict environment, and as far as using Python on major projects, well let’s just say that there is a reason people use TypeScript instead of JavaScript and Python doesn’t fall into that category.

But for most programming and for most minor systems or services, Python is just wildly good.

okwubodu · on Nov 15, 2021

Same here. I do game development in C# and Python is my goto for sketching POCs for complex mechanics, despite never formally learning the language. It could easily be the difference between spending the whole day vs two hours on the same issue.

> there is a reason people use TypeScript instead of JavaScript and Python doesn’t fall into that category.

I thought the dynamic vs static typing debate was pointless until I joined a large Python project. Now I pretty much consider anyone that can keep up with one superhuman.

xapata · on Nov 15, 2021

I think part of the problem is that many people who are used to static type analysis get lazy with naming and interface design. Toss one of them into your team and they'll "prove" the need for static type analysis.

doritosfan84 · on Nov 15, 2021

Possibly. IME it’s quick Python POCs that didn’t bother with good practices that turn into critical production services without anybody going back and applying good coding standards.

xapata · on Nov 16, 2021

That's a theme, for sure. "We don't have enough time to refactor!" is generally less true than "We need to slow down to go faster."

harpiaharpyja · on Nov 15, 2021

I also spent many years working with C# before working with Python professionally. I've also spent a lot of time with Lua, though not for paid work.

The second point matches my experience with dynamic languages as well. To use them well really takes a certain level of discipline, but it pays off. It's why I'm a bit skeptical of Python as a beginners language, which it sometimes has been touted as.

About large projects, presently I am working with Python on a somewhat large project (currently 53k sloc, I guess large is relative). Its been great, though we are pretty strict about almost absolutely everything being type-hinted.

goohle · on Nov 15, 2021

53k SLOC is tiny project. 5 senior developers at 1k SLOC per day will generate 53k SLOC in two weeks, just one sprint.

mynameisash · on Nov 15, 2021

How many people are writing 1k SLOC/day? And doing so consistently? If you're banging out that much code, I'd be severely concerned for the quality of the code itself as well as overall design.

goohle · on Nov 15, 2021

Senior developers (really seniors, not 3 year seniors) can trivially produce such amount of code, because they wrote almost same code dozen times already.

mynameisash · on Nov 15, 2021

My point isn't about who _can_ do it but who _does_. If anyone on my team was writing anywhere near 1k LOC/day, I'd think it was a serious problem. If the whole team was pumping out 1k LOC/day/person, I have to imagine I'd be getting as far away from the entire team/org as possible.

Maybe not the best example, but I used to work with a guy who had pretty large scripts that he was writing for one of our projects. It turns out it was mostly copy-paste, so it sort of got the job done -- except he fixed bugs in one place but not in the other. The whole thing was a huge mess to understand and maintain. And he was a senior engineer at the time (now principal, to my great amazement).

goohle · on Nov 16, 2021

Developers at fixed-price contracts. Fast work + low number of bugs = good margin. It's exhausting, but, IMHO, it's better to work hard for 6 months and then relax for few months than to slowly push project for 2 years with red eyes and headache.

eesmith · on Nov 16, 2021

Why are they re-writing the same code instead of re-using it? Either convert to a library, or copy&tweak the existing code.

Why are they writing the same code again instead of writing different code? Why aren't they learning about new APIs for topics they aren't so experienced in? What makes them "senior developers" and not "senior transcriptionists"?

Who is tasked with digging into 10 year old code, written by an ex-employee, to find a subtle bug? Who is identifying and fixing performance issues?

How are these "senior developers" able to write 1KLoC/day plus documentation and developer tests? I find test code and documentation each take as much time as writing the code in the first place.

Why aren't some of these senior developers producing negative LoC? https://www.folklore.org/StoryView.py?story=Negative_2000_Li...

goohle · on Nov 16, 2021

Because previous code is owned by previous client.

For example, 1M+ LoC of code, including build system, CI, test cases, built-in documentation, in 6 months is about 7kLoC per day, divided by 5 developers it's about 1400LoC per day per developer.

I completed 30+ projects already, I have 20+ years of experience. Few years ago I was able to close up to 20 tickets per two week sprint, when I worked with same level senior developers and dedicated PM, product owner, and QA team, at large outsource company at fixed-price projects.

It's easy when tickets are properly sized and described by PM, when PO is responsive and easy to reach, and when QA covers your back for complex test cases. 6 month project + 1-2 months to recover after, and then another such project.

eesmith · on Nov 16, 2021

Are your fixed-price estimates based on LoC?! Else, why aren't you embedding that knowledge in a corporate library ("contractors tools") which you license to all your future customers? Twenty years of playing one song over and over for years, even done well, doesn't make someone a concert musician.

Those numbers are ridiculous, and I say this having 25+ years of professional experience.

The typical industry numbers are under 100 LoC/day.

Eg, slide 20 of https://www.slideshare.net/ddskier/calculating-the-cost-of-m... ("A world-class developer (e.g. Facebook or Google senior engineer) will write 50 LOC per day")

"Improving Speed and Productivity of Software Development: A Global Survey of Software Developers" at https://uweb.engr.arizona.edu/~ece473/readings/9-Improving%2... (Fig 6) has Lines-of-Code per Total Man Months at about 1,750, so about 81 lines per day (assuming 21.62 work days per month).

"A Practical Approach to Software Metrics" at https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=819938&... says "Many industry rules of thumb describe programmer productivity in terms of lines of code, for example 350 [noncomment source statements] per engineering-month of effort." That's 16 lines per days.

Going the other way, in the COCOMO estimator at https://strs.grc.nasa.gov/repository/forms/cocomo-calculatio... your 1 MLoC for an "organic" project estimates 3390 person months. Not the 30 months you mentioned.

That said, it's really easy to distort LoC measurement. And I mean beyond the "put in 1,000 lines of the form 'a=1'" cheat.

For example, LoC traditionally refers to source code, not test, CI, etc. that you use. Why did you use that non-standard definition?

Test code, for example, may contain a lot of data records and autogenerated code. This doesn't require the same work as a source line of code.

Consider SQLite. https://sqlite.org/testing.html says the library is 143.4 KSLOC, while the test suite is 91911.0 KSLOC - nearly 100 million lines of test code! These tests were not all written by hand. At that ratio of test code to source code you would have about 1,400 lines of source code.

And it's possible to mis-measure source code. A few years ago I added about 400,000 lines of code to my project in a few weeks. These were auto-generated when I replaced a very confusing set of C preprocessor directives with a homebrew template system to compile specialized versions of functions across my parameter space.

I then added some Cython projects, which generates C files about 50x larger than the original pyx files.

Counting auto-generated code makes LoC a worthless measure.

You also wrote you have a QA team. I assume their test cases are in your repo, and counted in your 1M+ LoC count, but you didn't include their time.

The post-release defect rate per KLoC is about 7.47 (average) and 4.3 (median). See An Overview of Software Defect Density: A Scoping Study at https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6462687... .

If you write 1KLoC/day then you're likely generating 2 post-release defect bugs per day. That's in addition to bugs your QA team catches.

Your "up to 20 tickets per two week sprint" simply isn't enough to catch up with the number of bugs you likely generate.

Your numbers are so far outside of industry standards and academic findings that, with 20+ years of experience, you must know they are exceptional and difficult for anyone else to accept on your simple say-so.

goohle · on Nov 18, 2021

You numbers are right. I just checked my stats on current project: 101 LoC per day. On this project I have ) incompetent PM, so ticket descriptions are vague thus I need to manage ticket queue by myself, ) hard to reach product owner, so I must figure lot of things at my own, *) no dedicated QA team, so I need to write test scenarios in addition to automatic test cases and then perform them manually, etc.

> Are your fixed-price estimates based on LoC?!

I don't now. I was not a part of sales.

> Else, why aren't you embedding that knowledge in a corporate library ("contractors tools") which you license to all your future customers?

Because each customer has different requirements, different goals, different language and framework. Nobody wanted to pay for a library.

> If you write 1KLoC/day then you're likely generating 2 post-release defect bugs per day. That's in addition to bugs your QA team catches.

In practice, I had about 1 serious bug slipped per 1-2 weeks. Professional QA team known a lot of corner cases to test already. Professional developers know them too, even when writing in a different language using different framework. Moreover, we know how to prepare good CI, efficient automated tests, how to write efficient documentation, who is responsible for what, etc.

> Your numbers are so far outside of industry standards and academic findings that, with 20+ years of experience, you must know they are exceptional and difficult for anyone else to accept on your simple say-so.

How much I will be paid if you will accept my story? :-)

eesmith · on Nov 18, 2021

101 LoC per day * 10 days = 1,000 LoC, so "1 serious bug slipped per 1-2 weeks" isn't that far off the median rate.

> Because each customer has different requirements, different goals, different language and framework

Sure, but that doesn't match your earlier statement that the senior developers 'wrote almost same code dozen times already'.

That is, I don't see how "almost the same code" is able to handle different requirements, etc.

How much would you pay me to accept your story? ;)

nodejs_rulez_1 · on Nov 15, 2021

Python, like all dynamic languages, works best to bash out a new project into production ASAP, get a bonus and move on, while poor folks maintaining after you are trying to untangle it.

zcw100 · on Nov 15, 2021

I think there's a lot of "Python's the greatest thing in the world" coming from ML/AI and it always reminds me of this. I wish I knew who said it.

"People who know one language think it's the greatest in the world. People who know more than one think they all suck."

npsimons · on Nov 15, 2021

> "People who know one language think it's the greatest in the world. People who know more than one think they all suck."

The Blub paradox: http://www.paulgraham.com/avg.html

(BTW, the TL;DR is people that know more than one language admit they all fall short one way or another, but there are some languages that are more productive than others).

geofft · on Nov 15, 2021

You're saying you have less problems untangling C/C++/Fortran/etc. code left from previous folks who wanted to ship something and move on?

david38 · on Nov 15, 2021

Python’s “one right way to do it” is specifically designed to avoid this. This isn’t Perl.

woah · on Nov 15, 2021

Python has multiple different ways to do things even in the standard library lol

Zababa · on Nov 15, 2021

Current Python doesn't reflect this philosophy though. You have lots of options for strings and string formatting. There is pattern matching and if statements. The standard library often isn't the best option for stuff (like HTTP requests) so you use another library. Package management and deployment is far from solved, with lots of different tools.

klyrs · on Nov 15, 2021

Pet peeve: there is no right way to do type annotations. Two major projects aren't even compatible: https://www.cs.rpi.edu/~milanova/docs/dls2020.pdf

dragonwriter · on Nov 15, 2021

> You have lots of options for strings and string formatting.

“Strings and string formatting” is a very broad domain.

For most specific tasks, there is one low-impedance approach and it takes very little (but not zero) reflection and/or experience to find it.

Most new Python features directly address specific tasks for which there are currently multiple relatively high-impedance approaches taken because there is not one obviously correct way.

Python's “one obviously correct way” is not “one possible way” (the latter approach is closer to Go.)

tinco · on Nov 15, 2021

There's a great deal of evidence that supports that, it's just not scientifically organized. If you just look at the amount of high quality high polished projects out there for the dynamic programming languages it absolutely dwarfs those of static languages. And then add to that the fact that most of those are by solo devs in their free time.

It's not just that they are more popular, Java reigned supreme for years when Ruby and PHP and later Node.js outpaced it's communities by miles. And you can't say Java developers are not open source oriented, because they 100% are.

And the argument that programmers are simply most productive in the language they know best is trivially untrue. When I first learned of Ruby, I could dream in C#, I was absolutely fluent, knew the standard library by heart. I switched to Ruby and only looked back whenever I needed tight performance. And this is not just an anecdote, the whole Ruby on Rails movement was basically Java developers fleeing to greener pastures.

I don't know a single highly experienced multi-lingual developer who does not reach for a dynamic language when they need to deliver something quick and easy.

The only exception I know off is using Go for small networked services, which is quite comfortable and intuitive for a static language. Also outside that niche Go quickly loses its productivity edge.

dimitrios1 · on Nov 15, 2021

> I don't know a single highly experienced multi-lingual developer who does not reach for a dynamic language when they need to deliver something quick and easy.

Hi, highly experienced, multi-lingual developer here (Python, Ruby, PHP, JavaScript, Java, Go, C#, F#, OCaml, Elixir/Erlang, ReasonML, SQL, Smalltalk, Objective-C, Swift, leaving off quite a few prior Web 2.0 ones for brevity) nice to meet you. As you can see, I've done just about all of it: every paradigm, every syntax. It all honestly blurs together after a certain point, and it just becomes easier and easier to pick up a new language the more you learn.

Anyways, these days, I never reach for a dynamic language when I need to deliver something quick and easy. The difference is just too marginal. Not worth the downsides (and the downsides are immense!).

Because, as my experience has taught me, invariably one of those quick and easy ones will turn into something that becomes business critical, lives on for years after you are gone, and will be much more difficult for other, typically more junior, developers to update or enhance your code. Static typing is not only marginally less productive these days with all the great tools and IDE's out there (not to mention Go which has one of the least obtrusive static type checkers I've seen, or, even better if the org allows you, a language with a HM type system), but for a marginal improvement in productivity, you pay a heavy long term cost. Not worth it.

Dynamic languages are great for rapid prototypes. After that, convert it to a static language. Your junior devs that join after you, who aren't familiar with the entire ecosystem your work has will thank you.

barrkel · on Nov 16, 2021

If I'm prototyping something like a log file analyzer or web scraper I use a dynamic language because it gives me a debugger REPL. I can interactively develop code with immediate results and copy & paste straight from the terminal into the editor.

When I want to develop further, I put a debugger statement right where I want to pick up, where all the data is available, and develop from execution.

Some static languages have REPLs but few are as good as dynamic languages for mutating existing code in the middle of a debugging session.

If you don't write much code which involves exploration of data or unknown APIs, then this mode of development may not be as useful to you, but it's a significant productivity advantage for me and the reason I reach for a dynamic language. Ruby is my go-to, with binding.pry as my debugger REPL.

saila · on Nov 15, 2021

> After that, convert it to a static language

I don't think you can have a hard and fast rule like this. It will depend on the situation. In many instances, a "prototype" built in Python or similar is perfectly fine.

In addition, one can make a horrendous mess in statically typed languages as well. One of the absolute worst projects I ever worked on was written in a popular statically typed language. This is of course an anecdote and not to say statically typed languages are worse, but they're not automatically better either.

Just a note too that I'm a big fan of, e.g., Rust & TypeScript, and I often use type hints in Python, so I'm not anti-static typing.

dimitrios1 · on Nov 15, 2021

Way easier to refactor a that statically typed mess.

There is a floor for how bad you can write static typed code.

There is no floor to how bad of JavaScript or Python or Ruby you can write. The madness can descend to the inner most circles of coding hell. Only Perl exceeds it.

Jcampuzano2 · on Nov 15, 2021

Well which one do you tend to reach for then? Purely out of curiosity.

dimitrios1 · on Nov 15, 2021

Go in the general case, Java if complex domain modeling is required (or requires "enterprise grade" B.S - SOAP, WSDLs, or other crazy XML specified madness like Adobe extension stuff, etc), and after that: anything functional if I am allowed.

Explained:

Go's type system is generally less rigid. It strikes a good balance of strict enough. A lot of Go's converts aren't from "systems" languages like it targeted originally, but rather former Python/Ruby/PHP/JavaScript backend devs. I love the performance and low level levers I can pull with Go (although to be fair Java is quite fast enough). But finally, Go is easy to learn (26 language keywords?), the standard library is mostly great, and the worst developers I've seen still write mostly maintainable code that builds fast, which is what I optimize the most for these days.

Java, for all its warts and legacy cruft, these days you can write fairly good java, utilizing modern libraries. I love most things from Codahale, who in turn I think pushed orgs like Spring to write better libraries, so now everything's pretty good. Plus all the legacy stuff comes in handy when you have to deal with arcane government or financial systems, something I have to interface with frequently.

But if I had my choice, I'd use something where you can express functional programming concepts intuitively, without fighting the language or having to do it at a heavy performance cost, like Rust or better yet, just a full fledged FP language like OCaml (whom I understand heavily inspired Rust)

pyrale · on Nov 15, 2021

> I don't know a single highly experienced multi-lingual developer who does not reach for a dynamic language when they need to deliver something quick and easy.

Maybe this shines light on your social circles more than actual language impact on proficiency?

The "our language is so much more productive" myth exists in many languages, including the one I currently use. But when you dig deeper, there are almost always other social or technical factors that explain the productivity gap. That kind of gap exists even between people using the same tool.

The reality of it is that assessing what makes a developer productive is incredibly hard, and people doing so to claim their language of choice is better rely on anecdata and couldn't explain what "methodology" means to save their life.

brabel · on Nov 15, 2021

> I don't know a single highly experienced multi-lingual developer who does not reach for a dynamic language when they need to deliver something quick and easy.

This argument is not addressing the claim for general purpose software, only for "quick and easy" software.

To people in this thread: please stop to think before responding to the "wrong point". I think we can all agree that dynamic, scripting languages are more adequate for "quick and easy" software, but this is not what the point of the discussion is. The point of the discussion is, or should be in my opinion, whether it's true that for general purpose software, written by teams, that is not trivial to write by oneself in 5 minutes, that scripting languages are more productive than statically typed, compiled languages.

tinco · on Nov 15, 2021

Where do you draw the line? Is GitHub quick and easy software? Is Shopify? Were LinkedIn, Twitter? Are things built on Tensorflow?

I think you misunderstand the scripting language revolution of the mid 2000's. It's not that we suddenly realized scripting languages were the best for quick and easy projects. We realized scripting languages were suitable for a whole lot more than some data processing. We could much more effectively build huge scalable software platforms.

It got so bad that the sentiment flipped, and people like Joel Spolsky had to go out of their way writing blog posts that you could in fact build successful modern web platforms in C#. And then he went building the world's most popular project management tool in Node.js anyway.

smt88 · on Nov 15, 2021

> I don't know a single highly experienced multi-lingual developer who does not reach for a dynamic language when they need to deliver something quick and easy.

If you're picking from a popular language, the language itself rarely matters for this at all. Every popular language does roughly the same things. Writing speed (length of keywords, for example) is a non-issue. The ecosystem is by far the most important thing.

It doesn't matter how efficient I am in Go if I'm missing a crucial library that I'll now have to write myself.

JavaScript is a classic example where, even if you are pretty fast at writing your code, you will be dramatically slowed down by: 1) needing to add a new package every 5 min to do something trivial, 2) looking through five half-dead libraries to find one that seems maintained and usable, and 3) finding out that some of the libraries you chose are buggy.

So for a quick prototype, I'd weight a language's qualities as follows:

- ecosystem: 90%

- static analysis/tooling: 5%

- stdlib: 3%

- syntax: 2%

And for a long-term, complex project, I'd weight them closer to:

- ecosystem: 50%

- static analysis/tooling: 46%

- stdlib: 3%

- syntax: 1%

oopsyDoodl · on Nov 15, 2021

Python was Googles language of choice for a while, so people jumped at it to work at Google.

Popularity and expansive use may have nothing to do with quality, and a lot to do with financial influence on peoples agency.

Business wants people templating out directories of performant code, not generating syntactic art for the ages.

eesmith · on Nov 15, 2021

The classic, now quiet dated citation, is "An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl" by Lutz Prechelt.

Recent-ish commentary (July 2021) about it at https://renato.athaydes.com/posts/revisiting-prechelt-paper-... , HN commentary at https://news.ycombinator.com/item?id=28108806 .

Google Scholar gives about 476 paper which cite Prechelt's work. I have not followed other work in that field.

npsimons · on Nov 15, 2021

Norvig had a response to Prechelt as well: http://www.norvig.com/java-lisp.html

eesmith · on Nov 15, 2021

Mentioned and used as the Lisp baseline in the link I gave. ;)

npsimons · on Nov 15, 2021

> Mentioned and used as the Lisp baseline in the link I gave. ;)

Just saw that, nice. I like Norvig because he really cuts to the chase.

josephg · on Nov 15, 2021

I doubt it, though if anyone has data I’d love to see it. At the moment I’m quite comfortable in both javascript and rust. But I find javascript (/ typescript) noticeably more productive for small to medium projects. I can get more done with less work. For a variety of reasons it seems to take more work to write good rust code than good javascript code.

This isn’t a knock on rust - I just think rust trades off programmer productivity for correctness and performance. And it shows, on both sides.

smt88 · on Nov 15, 2021

> But I find javascript (/ typescript) noticeably more productive for small to medium projects.

This seems like a pretty apples-to-oranges comparison. Rust is intended to do things you couldn't/wouldn't use JavaScript for.

If you're talking about a small-to-medium project where you don't care much about mishandling memory, of course JavaScript is going to be more productive. Rust is forcing you to tell the compiler a lot of things that JavaScript assumes you don't care about (and is, most of the time, correct).

KronisLV · on Nov 15, 2021

> I doubt it, though if anyone has data I’d love to see it.

I recall a study being quoted in my university courses, which described the amount of code needed to get certain things done between different languages, which showed that Python, Ruby and others are on the less verbose side, the reasoning being that on average they'd also be more productive.

This seems to coincide with my personal experience, where JavaScript with React was much easier to work with in smaller projects, whereas using TypeScript with React lead to much slower development because of all the typing that needed to be handled, especially in cases of union types. Now, one can say that it's worth the effort to be more confident in your code doing what you want it to both now and after X months, much like you could sometimes prefer the type systems of Java or .NET over Python, Ruby or PHP, but in my eyes those tradeoffs always come with slower development velocity.

The sad thing, however, is that DuckDuckGo (and possibly other search engines) failed to return the original study or anything like it, only resulting in low quality blog content:

  - https://duckduckgo.com/?t=ffab&q=programming+language+comparison+amount+of+code
  - https://duckduckgo.com/?q=programming+language+comparison+verbosity
  - https://duckduckgo.com/?q=programming+language+lines+of+code+java+ruby
  - http://libgen.is/scimag/?q=programming+language+verbosity
  - http://libgen.is/scimag/?q=programming+language+lines+of+code

Anyone have any better search queries for this? Any idea why the search result quality is generally so low? Any ideas which study it might have been referencing?

brabel · on Nov 15, 2021

The research on this topic has been discussed on HN multiple times. It is indeed hard to look up previous discussions, but the conclusion is always inconclusive... one thing everyone seems to agree nowadays is that measuring LOC is a bad proxy for programmer productivity, specially when you take code maintenance into consideration (which very few research papers do), and that types do have a small, but measurable effect on improving quality (though whether that's at the cost of productivity is still unclear as far as I know - yes, programmers need to spend a bit more time to declare their types, but without them they have to spend more time every time they need to call anything).

eesmith · on Nov 15, 2021

Google Scholar is a better method to find published papers.

The first result for "program language comparison", at least when I view https://scholar.google.com/scholar?q=programming+language+co... is: Prechelt, Lutz. "An empirical comparison of seven programming languages." Computer 33.10 (2000): 23-29.

There's all sorts of work which cite that paper, like "An empirical investigation of the effects of type systems and code completion on api usability using typescript and javascript in ms visual studio".

For fun, here are some of those citation which themselves use the phrase "An empirical study" or similar in the title:

"An empirical study on the impact of static typing on software maintainability"

"An empirical study of the influence of static type systems on the usability of undocumented software"

"Do developers benefit from generic types? An empirical comparison of generic and raw types in Java"

"An empirical study on C++ concurrency constructs"

"An empirical study on the factors affecting software development productivity"

"An empirical study to revisit productivity across different programming languages"

CJefferson · on Nov 15, 2021

I think different languages do have sweet-spots, and it often has to do with library availablity (for me at least).

I consider myself as someone who knows "far too much about C++", but recently when I had to spider some webpages and read some values out of them to populate a database, I did that in Python because it's a handful of lines due to high quality libraries that all work nicely together. I honestly wouldn't know how to do that in a short amount of time in C++.

brabel · on Nov 15, 2021

I admit that Python is more productive than C++, but the reason is mostly to do with memory management... a language with a GC, like Java or JavaScript, would use a very similar level of abstraction as Python and therefore, should "cost" about the same to write.

Python and other dynamic languages don't use static types and that may give them a (very) small advantage in productivity as well, but only for very small programs... as program size increases, in my experience, statically typed languages take the lead in productivity... as OP is about writing general software, not just small scripts, I really have a hard time agreeing that Python is either the best tool for the job, or the most productive tool at all.

WithinReason · on Nov 15, 2021

It's more than that. For example, converting a string to uppercase in C++:

    #include <string>
    #include <cctype>
    #include <algorithm>
    #include <iostream>
     
    int main()
    {
        std::string s("hello");
        std::transform(s.begin(), s.end(), s.begin(),
                       [](unsigned char c) { return std::toupper(c); });
        std::cout << s;
    }

Same thing in Python:

    print('hello'.upper())

Hizonner · on Nov 15, 2021

That takes advantage of the fact that Python puts a bunch of relatively obscure functionality in scope by default... thus increasing the chance that programs will use the wrong method or rely on stuff they don't really intend to. Case manipulation isn't something most modern programs should even be doing at all, since it's almost impossible to internationalize it.

In fact it's really mostly name space management. Python doesn't require you to pull in "upper" because it starts with a relatively big name space. It doesn't require you declare the main program because you're relying on the fact that Python executes a file on loading it (and that reliance is arguably bad Python style). All of the "std::" stuff is a name space management choice.

... and the rest has nothing to do with static typing either. The way the code is indented on multiple lines is a stylistic convention. The transform primitive requiring you to specify bounds is a library choice, related to the language's apparent lack of a universal way to map over a container. I'm not sure why you need the lambda. The rewrite in place is memory model stuff inherited from C.

In Haskell, which is fully compiled and is about the most rigid static typed language you could ever imagine:

   import Data.Char(toUpper)
   main = putStrLn (toUpper <$> "hello")

If you'd asked for something that didn't require oddball functionality, then that could probably also have been a one-liner.

capitalsigma · on Nov 15, 2021

C++ is forcing a style on you that encourages you to avoid a copy. You could, of course, have a library like:

    std::string ToUpper(const std::string& str) {
      std::string as_lower(str);
      absl::c_transform(str, as_lower, std::toupper);
      return as_lower;
    }

to enable:

    int main() {
      std::cout << ToUpper("hello");
      return 0;
    }

but this would be no good! You have chosen an API that forces your users to be gratuitously inefficient. Maybe that is OK for your personal project, but an API like that has no business in the standard library.

WithinReason · on Nov 15, 2021

You just explained why C++ will never be as productive as Python.

capitalsigma · on Nov 15, 2021

No one is stopping you from writing your C++ inefficiently if you choose to. I am skeptical that spending a bit of time thinking about ownership/lifetimes is really such a big hit to productivity.

CodeGlitch · on Nov 15, 2021

> Python and other dynamic languages don't use static types and that may give them a (very) small advantage in productivity as well, but only for very small programs... as program size increases, in my experience, statically typed languages take the lead in productivity... as OP is about writing general software, not just small scripts, I really have a hard time agreeing that Python is either the best tool for the job, or the most productive tool at all.

As a counter point to this - Python has a large number of C-based libraries with excellent bindings. To take Numpy has an example, you can the static-types and speed of Numpy to do all your maths, whilst Python acts as the manager. Using Numpy "feels" like normal Python, you don't really feel like you're working at native-level, but you get all the advantages of native speeds and memory usage.

And to your other point, I've worked on large C++-based projects which were a complete mess, especially when you had to make any changes to existing code. I'm not sure the language has as much an affect here as just using the correct design principles.

yetihehe · on Nov 15, 2021

I've done big desktop program in python. For some parts, we just used modules written in C, because code was easily 20 times faster than python, I was even surprised at how fast simple blits could be on modern hardware without any graphics card acceleration. But python version was several times faster to write.

npsimons · on Nov 15, 2021

> but there's very little evidence to support that

Not if you don't go looking for it: http://www.norvig.com/java-lisp.html

ETA: And don't get me started on Design Patterns: https://norvig.com/design-patterns/design-patterns.pdf

smt88 · on Nov 15, 2021

> programmers tend to be most productive in whatever language they know best

This is true, but most software is built by multiple people (for corporations) and will eventually be maintained by multiple other people. Whether an individual is "most productive" in a language is rarely important.

deltaonefour · on Nov 15, 2021

I'm a C++ programmer and a python programmer. I assure you your conclusion is incorrect. Python with type checking is by far more productive. Many people who know python and another language can testify to this general truth.

simiones · on Nov 15, 2021

C++ is probably the least productive non-joke language ever created: it has deterministic destruction which requires (manual) ownership tracking, it has an extremely slow compiler with horrible error messages, it has massive performance differences between debug and optimized builds, it's focus on performance brings lots and lots of extraneous concepts to its libraries when just seeking to get something working (std::allocator, std::string_traits and many other similar examples).

Comparing Python to Java, C#, Haskell, Erlang, Go, maybe even C or Rust, would show a much smaller performance difference.

shepardrtc · on Nov 15, 2021

> requires (manual) ownership tracking

Have you tried RAII?

Would you like to know more? https://stackoverflow.com/questions/2321511/what-is-meant-by...

simiones · on Nov 15, 2021

RAII is exactly what I mean by "manual ownership tracking and deterministic destruction". RAII is better than completely manual memory management (C style malloc/free), but it still requires you to design your program such that every piece of memory is owned by some pointer with the correct properties. You have to choose between copying, bare pointers/references, unique_ptr and shared_ptr whenever two pieces of data are related to each other, or whenever you pass a piece of data to another function.

Rust is a little better since at least the ownership concept is known to the compiler and automatically enforced.

(Tracing/Copying/Compacting) GC is much easier since this entire concept goes away. You always pass references to data, and the physical storage is "owned" by the GC itself. It's also much faster for certain workflow patterns, though it always consumes more memory than deterministic destruction schemes.

isubasinghe · on Nov 15, 2021

I do a lot of Rust programming, I personally find Rust more productive right now.

rambojazz · on Nov 15, 2021

Granted, people come from different backgrounds, and "the right tool for the right job" always applies. But it's undeniable that generally speaking, ie. not considering any specific cases or requirements, some languages are definitely more productive than others.

flohofwoe · on Nov 15, 2021

Sometimes you just want to throw entirely differently shaped data into the same array or dictionary and be done with it instead of creating a "proper" type-safe design which most likely means writing lots of pointless boilerplate code. IMHO that's the whole point of a scripting language like Python, to write quick'n'dirty scripts, not "real programs".

blooperdoop · on Nov 15, 2021

[flagged]

dtech · on Nov 15, 2021

Yes, you can write your 2 line program faster, no one is disputing that.

But that's not the real world, in real world you have to change and debug large amounts of existing code, and collaborate with colleagues. There's little conclusive research on it, but so far everything hints at typed languages being superior for that. Heck, untyped "script-like" languages tend to evolve into the direction of added types and build systems (e.g. Typescript, Python type annotations, Babel, Webpack etc.)

blooperdoop · on Nov 15, 2021

Actually, the person I responded to did dispute exactly that by claiming the limiting effect of language choice is gated by experience. Read what they said again. That’s also reducing the argument to absurdity, which I pointed out, and which apparently did nothing to dissuade you from taking it as my actual position.

You’re also explaining a single sector of software usage to me (i.e., the revenue SaaS of the company) as “the real world,” which itself dismisses every other aspect of software development in a shop. Again, including SRE and ops scripts, which are almost always inappropriate in a mainline, industrial, compiled language. Please try to avoid explaining your point with the beginning assertion that those who disagree with you do not live in the real world.

brabel · on Nov 15, 2021

I am OP you're talking about... you misinterpreted my point completely.

I was saying that Python and other dynamic languages are not demonstrably more productive for writing general purpose software... as I did not mention scripting at all, I thought it would be obvious that this is what I meant.

> Read what they said again. That’s also reducing the argument to absurdity, which I pointed out, and which apparently did nothing to dissuade you from taking it as my actual position.

This is absurd. What "they" said was _programmers tend to be most productive in whatever language they know best_. How the hell is this disputing "you can write your 2 line program faster (in Python), no one is disputing that."??

With all the due respect, you need to work on your text interpretation skills. Do you understand that "programmers tend to be more productive" is not the same as "programs are always, without exception, more productive"?

Dobbs · on Nov 15, 2021

> Say I want to add two numbers and ship it to prod. In Python, that’s one line and an scp.

This example ignores what is actually involved in production code. Yes there is a compile step involved in things like Java. In python you have to manage your dependencies on the deployment box.

In Java (minus the runtime), Go, Rust, etc they all support managing dependencies on the client side. In my experience the messiness of that in python and ruby far outstrips the complexities of a compile step in the makefile or CI pipeline.

mumblemumble · on Nov 15, 2021

A case study from my own personal experience:

We generally do everything in Java. Python is avoided because of all the usual complaints - dynamically typed, not fast enough, GIL, etc.

So, as a PoC, I decided to try rewriting one of our services in Python. And, compared to the Java one, it is:

Cheaper to write and maintain. About 1/10 as many SLOC. About 1/20 as many person-hours.

Has better static type checks. For example, Java's type checker cannot statically verify for null safety. The Python one I chose does do that. Note that, since I did choose to put type hints on everything, the productivity boost in question cannot be attributed to dynamic typing.

(That said, not all 3rd-party libraries have type hints, so, if you want to type check everything, you may have some yak shaving to do.)

Uses less RAM. About 1/2 as much.

Is faster. Admittedly I'm leaning on numpy, Cython, and friends for this. It may well be much slower for projects where that is not possible. But still, I think that the point about premature operation stands in this case.

(Disclaimer: I also have a colleague at $FAMOUS_COMPANY who tells me they are moving off of Python because they found the opposite of the above in many cases. Though I personally suspect that a mitigating factor is that their profit margins and scale are large enough that all the coefficients in their cost/benefit formula are wildly different from the norm.)

mixedCase · on Nov 15, 2021

Not that I'm a Java proponent, but have you tried doing the same rewrite effort on modern Java and with modern tools?

With newer tools and most important of all: your rewrite being a second iteration with the benefit of hindsight, you may notice that the language choice makes less of an impact than expected. IME language differences tend to shine the most when writing something the first time around and its expression capabilities shape the design.

mumblemumble · on Nov 15, 2021

I also rewrote a Python service in Java in an attempt to control for that. The results came out similarly in terms of development & maintenance effort. Unfortunately it wasn't a thing where a performance comparison made sense, though, so I didn't really do one.

My sense was that the big benefit was actually down to the libraries and syntactic sugar. Dataclasses, comprehensions, and generator functions all have a big impact, but so do things like Python just generally having more ergonomic libraries for talking to databases, producting/consuming REST APIs, and reading/writing data files.

twic · on Nov 15, 2021

You mentioned NumPy. NumPy is great, and there is basically no Java equivalent. So if you are doing the kind of work which NumPy can help with, it's going to be very hard work to beat it with Java, either in development speed or execution speed.

To put it another way, perhaps what you tested here was having a great array library vs not having a great array library! It would be interesting to test Python vs Java on a more generic task, where there isn't a huge advantage from a particular library or set of libraries.

It would also be great if someone would write a NumPy-style library for Java. My team would use that a lot!

mumblemumble · on Nov 15, 2021

Numpy was not a factor in the Python->Java PoC; it was a service that did no number crunching whatsoever. There, the specific library advantages were the psycopg, flask, requests, json, and csv packages. All of the (mainstream) Java equivalents are just kind of verbose and fiddly to work with compared to those.

I don't think this is entirely down to the language itself. A lot of it is cultural. The Java community tends to favor a more "enterprisey" style of programming and API design that, at least to my tastes, tends to be rather overwrought. That could change, in principle. I'm just not optimistic about it happening any time soon.

twic · on Nov 15, 2021

There is a lot of enterprisey crap, but there are also many decent libraries. In particular, i am pretty happy with the Java options for the things you mention - the PostgreSQL JDBC driver, JDBC itself, and HikariCP for pooling, the JDK HttpServer (no templating though, don't know if you need that here), the JDK HttpClient, Jakarta JSON with Joy as an implementation, and SimpleFlatMapper for CSV. None of them are perfect, but i am pretty happy churning out generic business service type things with them.

I wish there was some way i could take a sabbatical from my job and come and try to rewrite your service in what i consider effective modern Java. I am pretty optimistic it would be competitive with Python in effort and maintainability, but i have no way to prove it!

BiteCode_dev · on Nov 15, 2021

There is also no Django for Java, no Fast API for Java, no Typer for Java, etc.

Why ? Because it's easier to write great frameworks in Python.

It compounds.

twic · on Nov 15, 2021

Equally, there's no Hibernate for Python, no Spring for Python. And FastAPI looks basically like JAX-RS to me.

I don't think the lack of any of these things is because they're impossible, or even particularly hard. It's because the community did or did not have the need and energy to build them.

Remember that Java had Java EE forced on it quite early. That defined web development for a long time. Eventually Spring overthrew it, and became the new tyrant. Neither of those are quite like Django, or Rails, but they occupied the space that would have had to have been empty for a Django or Rails to emerge. I think this is rather unfortunate; it would be really useful to have a vibrant Django- or Rails-like framework in Java.

There's no Typer because there isn't much of a culture of building command-line tools in Java. There are some good command-line parsing libraries, but nothing with all the bells and whistles and extensions that Typer has, as far as i know.

BiteCode_dev · on Nov 15, 2021

> Equally, there's no Hibernate for Python, no Spring for Python. And FastAPI looks basically like JAX-RS to me.

Of course there is. In fact, SQLA is easier to use than Hibernate, and the ecosystem around is as rich. As for Spring, crossbar fits the bill.

> I don't think the lack of any of these things is because they're impossible, or even particularly hard. It's because the community did or did not have the need and energy to build them.

I've never say they are impossible. I've said they require less energy to build them in python :)

slotrans · on Nov 15, 2021

No one who writes Python has ever said to themselves, "I wish I had Spring." It's not necessary, and it wouldn't even be beneficial. The kind of dynamism that Spring brings to Java, Python already has.

Also Hibernate is terrible and should be erased from the Earth.

kwhitefoot · on Nov 15, 2021

> You mentioned NumPy. NumPy is great, and there is basically no Java equivalent.

Things like this are the greatest disappointment to me after fifty years of writing software. It should not be necessary that Java equivalent exists. It really ought to be possible to connect to libraries from any language by now.

twic · on Nov 15, 2021

After fifty years of writing software, i would have thought you would understand why that wasn't possible.

kwhitefoot · on Nov 16, 2021

Perhaps you could enlighten me as to why it isn't possible? I do understand that it would be difficult.

mumblemumble · on Nov 16, 2021

WinRT is an interesting thing on that front, though I am not aware of a non-Windows equivalent, and so it's never really going to see much open source adoption, let alone support on other platforms.

twic · on Nov 16, 2021

Back in the '90s, when we still remembered how to dream, we had CORBA.

CORBA got a lot of bad press, but towards the end, it was actually pretty decent. Certainly the best way to do the thing it was trying to do!

Mikeb85 · on Nov 15, 2021

What do you mean? JBlas exists. Colt exists. There's probably others but I'm not super up to date on Java things.

Numpy is just wrappers for BLAS and LAPACK.

plafl · on Nov 15, 2021

> Numpy is just wrappers for BLAS and LAPACK.

Certainly not. I encourage you to have a look at the API of BLAS/LAPACK to perform basic operationa. And that just scratches the surface.

Mikeb85 · on Nov 15, 2021

https://numpy.org/doc/stable/user/building.html

It is. Maybe it's a very fancy wrapper.

mumblemumble · on Nov 15, 2021

I would argue that numpy is a fancy wrapper for BLAS in the same way that a Spitfire is a fancy wrapper for a Rolls-Royce Merlin 61.

xapata · on Nov 15, 2021

I suspect there's some algorithmic choice hidden in a library in this comparison, where the Python library has the right choice for the job and the Java library has the wrong one.

tmp_anon_22 · on Nov 15, 2021

Are you as familiar with Java as you are Python?

mumblemumble · on Nov 15, 2021

I'm much more familiar with Java. It's mostly what I get paid to write. Python's relegated to times when I can get away with it, which are relatively uncommon.

killingtime74 · on Nov 15, 2021

Kotlin Scala does all these and interface with Java to boot

TheDudeMan · on Nov 15, 2021

> Java's type checker cannot statically verify for null safety.

Check out NullAway.

> Uses less RAM. About 1/2 as much.

Were you on a modern JVM?

mumblemumble · on Nov 15, 2021

It's not just null. For example, Java much more frequently leaves you in a situation where you need to cast to/from `Object`. In Python, `Any` type hints can usually be avoided, because of union types.

taeric · on Nov 15, 2021

First, I want to make it clear that if you have a programming style that works for you, please don't take this as an attack against it. Keep using what works.

If you find yourself commonly having to cast to/from Object in Java, that is almost certainly something of your own doing.

ubercore · on Nov 15, 2021

What's the "Java way" of accomplishing something like you'd represent with Union or Intersection types?

avita1 · on Nov 15, 2021

Other comments have linked newer language features that make it easy. But for years, the Java Way of handling discriminated unions was to use the visitor pattern [1]. It's very verbose, and is an insane amount of typing unless your IDE is doing the typing for you, but it has the compile time guarantees that forces each caller to handle every type without instanceof/Object.

[1] https://dzone.com/articles/design-patterns-visitor

taeric · on Nov 16, 2021

Apologies for forgetting I had posted this... :(

My gut would be a visitor for the fully general case. Though, I would also accept that you are using the wrong language for the abstractions your are writing? My guess is it will depend on why you are needing/wanting one of those types.

For a lot of plumbing code, these aren't as necessary as stipulated. Yes, there could be some duplication of code. But, especially in a microservice landscape, most of that duplication will be each service doing their version of whatever you are doing.

programmer_dude · on Nov 15, 2021

There is none, without up or down casting. This is just some "guideline" that gets paraded from time to time (in OOP circles). On second thought, I am sure there's a "design pattern" for sum types out there. Why take the easy way out, am I right?

dljsjr · on Nov 15, 2021

A common interface with all of the required functionality exposed as interface methods. A Java person would argue that if you have a function that takes an object that can be two totally disparate things with no shared functionality then that’s a code smell.

tomtheelder · on Nov 15, 2021

Probably with a composite class I guess. So a union type of types A and B has members of types A and B, and a non-nullable flag to indicate which one this instance is. Intersection type is just the same thing without the flag.

I don't write Java, though, just guessing.

mumblemumble · on Nov 15, 2021

It's one option. It has the advantage of structurally limiting the types that can be used in that spot. It has the disadvantage (compared to true union types) of offering no static help beyond that.

If you add a case, for example, you're 100% on your own to make sure that all code that interacts with the type is updated to handle the new case. Slip-ups will produce run-time errors rather than static ones.

It's also rather tricky (and awkward) to set things up such that consumers are forced to check the case value before attempting to coerce it.

All of this can be worked around with custom linter rules, but then that becomes its own maintenance burden.

My preference is to try and invert things such that you can rely on dynamic dispatch and "tell, don't ask." That eliminates the need to coerce things at run-time in the first place.

Someone · on Nov 15, 2021

You can also implement an union type as a sealed interface/abstract class with two classes that implement/extend it (https://openjdk.java.net/jeps/409, https://www.baeldung.com/java-sealed-classes-interfaces)

ar_lan · on Nov 15, 2021

> So, as a PoC, I decided to try rewriting one of our services in Python.

Not to your main point - but this line threw me off. I really thought you were discussing this as a "person of color", and I was so confused as the relationship of persons of color and Python. ^_^

xrobledo84 · on Nov 15, 2021

So what does PoC mean here then?

twobitshifter · on Nov 15, 2021

Proof of concept

overtomanu · on Nov 15, 2021

you have to also consider backward compatibility. If at all there is another version change in python like the one that happened from 2.0 to 3.0, maintenance effort would be more.

BiteCode_dev · on Nov 15, 2021

Java, Perl and Ruby all had the same.

In fact, in 2018, Java 8 was still massively used:

https://jaxenter.com/java-8-still-strong-java-10-142642.html

Then you have libs and frameworks, and they break compat too. The JS ecosystem basically broke everyone code for 10 years every 6 month.

It's not a Python thing.

mumblemumble · on Nov 15, 2021

According to the latest jrebel report, about 70% of Java shops are still on Java 8 as of earlier this year. The changes with Java 9 certainly created plenty of work for me. Java's famous claim to backward compatibility mostly concerns the JVM itself. They have done a killer job there. But the virtual machine is not the same thing as the platform that rests atop it.

This is not by way of complaining about Java 9. The changes are good and needed to happen. And the reasons why most people are still on Java 8 are also legitimate. Everyone's handling the situation in a fairly mature way, from what I've seen. The level of histrionics about Python 3, though... I've got a lot less sympathy for that.

pella · on Nov 15, 2021

> Prototype in Python

Julia one of the target is solving the "two-language problem"

"Julia seemed to have solved the “two-language problem”—a conundrum often facing Python programmers, as well as users of other expressive, interpreted languages. You write a program to solve a problem in Python, enjoying its pleasant syntax and interactivity. The program works on a test version of your problem, but when you try to scale it up to something more realistic, it’s too slow. This is not your fault. Python is inherently slow—something that doesn’t matter for some types of applications but does matter for your big simulation. After applying various techniques to speed it up but only realizing modest gains, you finally resort to rewriting the most time-consuming parts of the calculation in C (most commonly). Now it’s fast enough, but now you also need to maintain code in both languages, hence the two-language problem."

https://arstechnica.com/science/2020/10/the-unreasonable-eff...

deckard1 · on Nov 15, 2021

What's the tl;dr of how Julia is solving this? Looking around it seems the answer is "multiple dispatch". Which seems suspect considering many languages have already tried this (Common Lisp, for example).

> Clearly, multiple dispatch, or some other way around the expression problem, is necessary for the kind of fluent composability that I’ve described above—but it is not sufficient. Julia has enjoyed an explosive degree of uptake in the scientific community because it combines this feature with several others that make it very attractive to numericists.

That's incredibly handwavy. So what's the special sauce?

There is no such thing as a free lunch when it comes to dynamic vs. static. It also seems like Julia is trading off expressiveness and easy of use in favor of efficiency, based on comments from people that have used Julia. It's one thing to be faster than any inherently slow language (Ruby, Python, Smalltalk, etc.), but keeping that flexibility and being as fast as C/C++ is a rather bold claim. Most languages hit some middle ground between the two, such as Java. But no one is under the delusion that trade-offs weren't made to get there.

fault1 · on Nov 15, 2021

Julia combines the following:

- multiple dispatch

- parametricity

- lightweight subtyping

- staged programming

In interesting ways.

felleisen's class talks (partially) about it here: https://felleisen.org/matthias/4400-s20/lecture15.html

I guess the best way to put it is that Julia encourages a style where 90%+ of code can go through paths that are static.

Personally, I think Julia starts off as easy as python, but to get C++ or Fortran speed, you can't just code naively. Things go into a steep learning curve at that point, but perhaps there isn't yet as much know how about how to code "professional Julia" yet. There needs to be a book like Fluent Python or Effective C++ for Julia, or perhaps a condensed version of the Julia manual (see the 1 page zig manual for inspiration).

The other problem I have with Julia right now is lack of static type checkers. "modern python" (e.g, python in production in the last 5 years) tends to leverage the large ecosystem of things that hook into mypy (I'm taking about tools like pydantic) to reduce the inherent brittleness of the language. Ruby, php, and every other dynamic language has also seen that trend.

Right now, I've barely seen that with Julia, and it needs this badly for higher uptake in industry. It's why for example, perhaps you see a lot of Julia packages written for people's phds right now.

Jtsummers · on Nov 15, 2021

Julia does a few things differently then Common Lisp, though they both offer multiple dispatch.

One of the key things in CL is that it has its metaobject protocol which forces a lot of decisions on what gets executed to runtime. There are ways to speed it up, but if you have something like:

  (defgeneric foo (x))
  (defmethod foo ((x number)) (print 'number))
  (defmethod foo ((x integer)) (print 'integer))
  (defmethod foo :before ((x number)) (print 'also-a-number))
  (foo 3)
  (foo 3.0)

Then CL won't call foo specialized on number when given an integer, but will call foo :before specialized on number. It determines this at runtime by searching for all applicable methods based on the type (at least as a first pass, you can cache this to speed it up but then you also have to have cache invalidation if a definition is changed).

Julia doesn't have that aspect of CL's MOP. So this helps to simplify the search for applicable methods and dispatch. Even if it did all its dispatch at runtime, it would still be simpler. The other thing Julia does is aggressive JIT compilation. So if you wrote something like (with the Julia equivalent of foo from above):

  function bar(x,y)
    foo(x)
    foo(y)
  end

And, only considering floats and integers, later called it with each pair of float and integer then Julia would compile specialized versions for those 4 combinations. Now when you call bar it still has to properly dispatch it, but once inside bar the search for the correct foo can be bypassed because the types will be known. CL, again thanks to the MOP, doesn't make that as easy to achieve.

eigenspace · on Nov 15, 2021

Your instinct that Julia is making tradeoffs is indeed correct, however I don't think it actually limits the expressiveness of the language. I happen to think that Julia is a more expressive language than Python. However, it does require learning new patterns and paradigms and someone who tries to write Python code in Julia is probably bound to eventually get frustrated.

A huge part of the design considerations for Julia essentially boiled down to "what sorts of dynamism and language semantics can we disallow while keeping the the good parts of dynamism"

The two biggest things that had to go in order to make Julia fast was

1) the ability to change the memory layout of a struct in a running session

2) the ability to eval in the local scope (our eval always occurs in the global scope)

These two things are huge performance problems. We might oneday solve 1) with Revise.jl (though it'll mean recompiling all your code if you do change the layout) but 2) is basically just a very bad idea and likely to never happen. Instead of a locally scoped eval, we have macros, multiple dispatch, parametric types, and generated functions. These give an incredibly powerful suite of metaprogramming tools that are beyond anything available in Python.

ChrisRackauckas · on Nov 16, 2021

There are trade-offs made. This discussion of "why Julia" describes how multiple dispatch + type stability is what gives the speed, but the trade-offs and edge cases associated with that.

http://ucidatascienceinitiative.github.io/IntroToJulia/Html/...

As for expressiveness, this then leads to different programming styles which I explained in a blog post:

https://www.stochasticlifestyle.com/type-dispatch-design-pos...

adgjlsfhk1 · on Nov 15, 2021

Julia makes a number of (in my opinion) really good tradeoffs here.

1. You can't add fields to a type (struct) after definition. This means that Julia's structs have no overhead and are essentially equivalent to structs in C (although they are parametric)

2. No local eval. Eval in Julia only happens in the global scope and results of eval are only visible the next time you visit the global scope. This may sound kind of unintuitive, but in practice people don't generally use this for good reasons. This allows Julia to never need to de-optimize code. Once a method is compiled that code remains valid.

3. Macros. Julia has really good macros and other code manipulation (since it is basically a Lisp). This makes it possible to generate very complicated but fast code that you would never write yourself. The tradeoff here is that it makes the language more complex, but that's a pretty good tradeoff. (especially compared to the C/Fortran land of using a preprocessor that works on text).

4. Just-In-Time (just ahead of time). Julia at it's core runs as if it were highly templated C++ code. If everything got compiled ahead of time, Julia would be generating terabytes of compiled code and never finish compiling. Instead, Julia makes the tradeoff of only compiling for the argument types that are actually used in the program, which means that it only compiles a reasonable amount of code. The tradeoff here is that compiling small binaries with Julia is very difficult (not possible to do automatically yet).

The TLDR is that most expressive languages started by giving away as much expressiveness as possible, and then looked at how they could be sped up. Julia started by being a modern fast language and looked to see how much expressiveness could be added without slowing the language down.

GavinMcG · on Nov 15, 2021

Nim seems like another candidate.

GuB-42 · on Nov 15, 2021

Python is not just slow, it also tends to break down for large projects, like all dynamic languages. It is not an absolute, you can do it, but the larger the code base, the more you need strong guarantees over flexibility, and the less relevant languages like Python tend to become.

Prototyping in Python is a viable strategy. Personally I tend to use Perl for that, because I am comfortable with it and I find it better adapted to quick prototyping than Python. I then rewrite in another language, usually C++, with more care about data structures and optimization. The author strategy to do everything in Python than optimize the slow parts can make sense, though I tend to prefer the opposite: white your framework/engine in a static language (like C++) and embed an interpreter (Python, LUA, etc...), as it is common in game dev.

But all that are language decisions! Whatever you do, there are going to be consequences. Choosing Python with C/C++/Rust optimizations is not bad, but it is a rather strong opinion, not the obvious default choice the author makes it.

medo-bear · on Nov 15, 2021

> like all dynamic languages

I think you would be surprised by the amount of type-checked bugs that can be eliminated in dynamic languages with standard software engineering practices. have a look at how common lisp, for example, handles type annotations and type warnings: https://lispcookbook.github.io/cl-cookbook/type.html

instead of blaming dynamic languages, i think project managers need to take on better practices

> but the larger the code base, the more you need strong guarantees over flexibility

i think i disagree with this. the larger the code base the more flexible the program needs to be towards its input, otherwise your program is too sensitive and it will crash. the output however i think should be strict

npsimons · on Nov 15, 2021

> I think you would be surprised by the amount of type-checked bugs that can be eliminated in dynamic languages with standard software engineering practices. have a look at how common lisp

To second this, I just read someone's anecdote about how unit tests/TDD is eliminating the need for type checking: https://www.artima.com/weblogs/viewpost.jsp?thread=4639

> i think i disagree with this. the larger the code base the more flexible the program needs to be towards its input

And again, I'll second this with another anecdotal posting, but with the emphasis that input is not just run-time, but also specifications for programs: http://ivy.io/common-lisp/2015/03/03/guerilla-lisp-opus.html

deltaonefour · on Nov 15, 2021

Modern python (used in production) should be used in conjunction with static type checking apps. Your argument is no longer relevant for modern python.

  def div(x: Int, y: Int) -> Optional[Int]:
      return x // y if y != 0 else None

The above should be what production level python should look like nowadays.

https://www.infoworld.com/article/3575079/4-python-type-chec...

That is not to say the type checking situation for python is perfect, far from it, but the situation is good enough that arguments against python that utilize the dynamic nature of the language are mostly no longer relevant.

globular-toast · on Nov 15, 2021

Division by zero should not return None. Is there an aversion to exceptions in "modern" Python?

pxc · on Nov 15, 2021

Why not? If there is no quotient, None seems like a good answer to the question ‘what's the quotient?’

deltaonefour · on Nov 15, 2021

I can see where he's coming from. It's the typical reaction of a beginner programmer.

None/Null in many languages isn't type checked even if the language itself has static type checking. It can be passed to other functions mysteriously and remain a hidden error. Javascript and C++ are guilty of this. He's complaining of a safety issue most likely.

However note the type signature I specified Optional[Int] encodes the None type into the static type checker. In modern programming this implies Null/None safety at the type level.

I'm not up to date with the current state of python type checkers so those type checkers may let some things pass depending on some options you set. See my reply to the other guy for a more detailed expose.

globular-toast · on Nov 15, 2021

> I can see where he's coming from. It's the typical reaction of a beginner programmer.

This seems like the typical response of someone who posts bait ready to flex their "advanced programmer" knowledge. There's really no need to call out people's level of experience, ever. Personally I'm past the point where I think I've learnt everything so I just ask questions when I don't know.

deltaonefour · on Nov 15, 2021

I am not calling anyone out. Let me be absolutely clear with you, there is NOTHING and I mean absolutely nothing wrong with asking questions or being a beginner programmer.

The problem with your earlier post was that you weren't asking a question. You were stating something as if it were a fact.

globular-toast · on Nov 15, 2021

I only stated that division by zero shouldn't return None, which you seem to agree with, no? The difference is how you ensuring that doesn't happen. You really only needed to say "this let's you do a static check to make sure you're not dividing by zero".

deltaonefour · on Nov 15, 2021

No. I don't agree, I'm saying the None is better so long as you encode the concept of a None into the type. This is done by converting an Int into Optional[Int].

globular-toast · on Nov 15, 2021

It's not that there's no quotient, it's that you shouldn't even have asked what the quotient is.

yxhuvud · on Nov 15, 2021

The thing about exceptions is that they are runtime. Pushing the issue to the type system forces the user to handle the case.

deltaonefour · on Nov 15, 2021

This is definitively wrong. Not to comment on "modern" python, but in general if this code returned an exception on division by zero it is definitively worse.

The reason is because an exception is a runtime check. It caught an error that is hidden until runtime. The better way is to encode this logic into a static type checking system so this error isn't even permitted to run.

With the type Optional[Int], Any other function that utilizes the output type of this function MUST also be an Optional[Int] and not just an Int, meaning that the function signature must handle the None case and this is caught statically before run time. Insofar as the function type signature... python type checkers should handle this error.

However tbh I'm not sure how extensive type checkers for python are and whether or not they can track this type of error for the logic itself. For example while the below will catch a type error if you feed it a bad parameter, the signature itself could be wrong and not caught by a type checker:

    x1: Int = 2 # results in type error (Good)
    x2: Optional[Int] = None # results in runtime error. This is bad in terms of safety

    def addOne(x: Optional[Int]) -> Optional[Int]:
        return x + 1

As far as I know the above is permitted by python type checkers when truly correct code is below:

    def addOne(x: Optional[Int]) -> Optional[Int]:
        return x + 1 if x is not None else None

However with python 3.10 pattern matching the level of static type checking can increase to the point where this type of safety is 100% possible: https://www.python.org/dev/peps/pep-0636/.

    x1: Int = 2 # results in type error (Good)
    x2: Optional[Int] = None # No error (also good)

    def addOne(x: Optional[Int]) -> Optional[Int]:
        match x:
            case None: # if this case were deleted should result in static type error (also good) 
                 return None
            case _:
                 return x + 1

Not sure if python type checkers can handle the above code, but exhaustive type checking is 100% viable in the future if the syntax utilizes the pattern matching shown above. Haskell and Rust already have this level of type safety. See: https://rustc-dev-guide.rust-lang.org/pat-exhaustive-checkin...

What this means in short is that you are wrong. Whatever your notions of "modern" python should be having division by zero return a runtime error instead of a None or an Error type is definitively worse. We are at a point in our technology that this type of error should be caught automatically BEFORE a program even runs. All type errors should be encoded into types and caught via static type checking and NOT via exceptions.

tzs · on Nov 15, 2021

I'm not very familiar with Python type checkers, so what follows may be completely off base.

> The reason is because an exception is a runtime check. It caught an error that is hidden until runtime. The better way is to encode this logic into a static type checking system so this error isn't even permitted to run

But in

  def div(x: Int, y: Int) -> Optional[Int]:
      return x // y if y != 0 else None

don't you have a runtime check? It has to do a runtime check of the value of y in order to decide whether or not to return None?

I don't see how this is encoding the logic of dealing with divide by zero into the type system or permitting the error to run. I'd expect that to deal with it in the type system to not prevent it at runtime would require y to be a type that does not allow 0 (does Python's type annotations allow constrained integers?).

Handling it by returning None means the caller has to deal with None. If it deals with it by just return None too, then its caller needs to deal with it do. If whatever code finally really deals with it instead of just passing it upwards is farther up than div's caller or maybe div's caller's caller it seems like what you've ended up with is something that is less clear than exceptions and requires more runtime work.

For those cases where the return None approach is better than using an exception, can you get rid of the runtime check for y being non-zero with something like

  def div(x: Int, y: Int) -> Optional[Int]:
      try:
          return x // y
      except ZeroDivisionError:
          return None

or do exceptions in Python occur sufficient overhead when not taken to make explicitly checking y win?

hermitdev · on Nov 15, 2021

Python, historically, has not had zero-overhead exceptions. There is a runtime cost associated with a try block. 3.11 will be adding zero-overhead exceptions, though, per https://bugs.python.org/issue40222.

deltaonefour · on Nov 15, 2021

It encodes it into the type system at the function signature level. Optional[Int] is an Int with a None, so if you pass this type into f(x: Int) you get a type error because f(x: Int) is a function that deals with Ints, not Optional[Int]. So you get a degree of safety here.

As I mentioned this safety does not usually extend to the function definition. I stated that it can though (and it is in haskell and rust), and this is usually done through a feature called pattern matching. Pattern matching is available in python 3.10 but I'm not sure whether exhaustive type checking on pattern matching by external type checkers is available yet.

lkitching · on Nov 15, 2021

For division you really should put the requirement for the non-zero check in the type of the divisor instead of propagating failure to the caller e.g.

    div(x: Int, y: NonZero[Int]) -> Int

Exceptions at least have the benefit of retaining the location the context for the error occurred which gets lost without extra bookeeping when using optionals.

Spivak · on Nov 15, 2021

This ends up being a super unergonomic API in practice because you would need to modify the type-checker to infer positive integers correctly. Nobody wants to go through the effort of

    NonZero = NewType(‘NonZero’, int)

    if x != 0
        x = NonZero(x)

just to call your function, especially since you still have to handle the case where someone does NonZero(x) blindly without checking. Should the constructor throw?

lkitching · on Nov 15, 2021

The check has to go somewhere and the caller to div has the most context in the event the divisor is 0. If you return an optional from div then you either impose the check on all the callers, or just propagate None everywhere and some top-level function has to deal with mysteriously missing values.

The NonZero type should be responsible for checking the wrapped value is non-zero, you will probably want safe and unsafe constructor functions

    Int -> Optional[NonZero[Int]]
    Int -> NonZero[Int]

where the unsafe version throws.

If this is overkill for your application then I'd prefer throwing an exception in div rather than encoding the failure in the return type.

deltaonefour · on Nov 17, 2021

That's your preference. However from a safety standpoint it is better to return an Optional[Int] or pass a NonZero[Int]. These two checks eliminate an entire class of runtime errors from ever occurring and that is a quantitative metric that is definitive.

lkitching · on Nov 17, 2021

My first preference is to change the type of the divisor to NonZero[Int], and throwing an exception within div is only my second preference. It doesn't make sense to change the return type of div to Optional[Int] because the success or failure or the operation is determined entirely by a property of one of the arguments, which the caller can always check. Making div partial and encoding it in the return type just moves the the problem away from the place it occured and can actually be handled.

deltaonefour · on Nov 17, 2021

It does make sense because the next thing handles the return type must be equipped to handle the optional type or the program won't run/compile at all.

This still moves the check out of runtime and into a static check. An exception means the error is handled runtime and as I said before is a definitively worse metric.

Let me put it more simply. An exception means that you can compile an error or mistake and it might accidentally make it to production. This happens because exceptions are only thrown on compiled programs during runtime.

An optional means that run time errors of this nature are impossible to happen. A program that throws an exception on division by zero is impossible to even EXIST. This occurs because if the optional is used, the static type checker will prevent such a program from even compiling. That is why it is definitively better.

lkitching · on Nov 18, 2021

Div always returns an int as long as the precondition - that the divisor is non-zero - is satisfied. The caller is always responsible for ensuring the precondition is satisfied which means the caller must ensure the divisor is non-zero before making the call. Checking it dynamically and throwing an exception does have the drawbacks you state, but if you're going to represent the contract in the type system then you do that by constraining the type of the divisor, not widening the return type.

Your encoding has really changed the meaning of the div function - it now always accepts any two arbitrary ints and always allows the possibility of returning None. So you've weakened both the precondition and the postcondition, which is now easier for the caller but imposes a cost on every location where the return value is used. As the optionality encoded in the value propagates further from the call to div, the context for the source of the missing value is lost. This is 'safer' in the sense of avoiding crashes at runtime but doesn't actually help resolve the actual issue of avoiding a zero divisor when calling div.

deltaonefour · on Nov 18, 2021

>Your encoding has really changed the meaning of the div function - it now always accepts any two arbitrary ints and always allows the possibility of returning None.

No it did not. That is simply your interpretation of it. I can rename "None" to "undefined" or to "Exception" and the etymological meaning is now more inline with your thinking.

The concept of exception or undefined can be encoded into a type or the set of all ints. What you are doing is saying is roughly equivalent to, "No I want to use the mathematical definition of Z, and I don't want to encode the reality of what's actually going on into the type."

You are getting hung up on a wording issue because you are stuck on the mathematical concept of undefined and division of Ints. Think of the problem from a higher mathematical perspective of sets and define a set (aka type) that fits the use case. That's just one perspective on the flaw in your thinking. There is another isomorphic perspective as well:

There is nothing weak going on here. There is no difference in saying that division by zero maps to "undefined" vs. Not defining it period. The concepts are isomorphic. When you construct the mapping in the english language when you talk to people. You literally say "division by zero is undefined" or "division by one is the same number" all these sentences are equivalent to saying "division by x/y/zero/ maps to undefined/z/somenumber."

Don't get too lost in mathematical conventions. Understand the higher meaning behind the concepts and you'll be much more flexible.

>This is 'safer' in the sense of avoiding crashes at runtime but doesn't actually help resolve the actual issue of avoiding a zero divisor when calling div.

You're not seeing the bigger picture. The only way to avoid a zero divisor is to use your made up type NonZero[Int]. There's no other way. Not even your exception handling can prevent a zero divisor.

The best way to do this is to FORCE the user to handle a zero divisor pre-compile time. This does not happen with an exception. Your program just crashes.

The pattern matching I illustrated above is a mechanism for forcing the user to handle any zero divisor related logic or the program won't Compile period. It is definitively better.

Think of it this way. Exceptions are cheating. It's some sort of escape hatch built into the language. It's better to have no escape hatches. Build all possible outcomes into the type.

lkitching · on Nov 19, 2021

> There is nothing weak going on here

The type for your static version of div is:

    def div(x: Int, y: Int) -> Optional[Int]:

The precondition for the dynamic behaviour of div is that the divisor is non-zero. If you want to think of it in set terms this means y must be a member of Z - #{0}. But your encoding allows any member of Z. This is a larger set and therefore a weaker precondition.

Likewise the postcondition of the dynamic version is that an Int is returned, but yours only guarantees an Optional[Int]. Again in set terms this is something like Z + #{None} which is a larger set than Z and therefore a weakening of the post condtion.

Your version encodes the dynamic behaviour of throwing an exception in the return type, but I'm saying that

    def div(x: Int, y: NonZero[Int]) -> Int

is a more precise static definition of the behaviour of div, and if you're going to add types to the dynamic version you should prefer this approach. Our two definitions are not equivalent since your function can easily be implemented with mine:

    def yourDiv(x, y) = map (fun nz: myDiv x nz) (fromInt y)

but you can't (safely) implement my version using yours since it returns an Optional[Int] and an Int is required. This shouldn't be surprising since algebraically (Int, Int) -> Option Int is a larger type than (Int, NonZero[Int]) -> Int.

> You're not seeing the bigger picture. The only way to avoid a zero divisor is to use your made up type NonZero[Int]

Obviously I know this because I suggested using NonZero[Int] in the first place. Your version accepts an Int divisor and just encodes the partiality in the return type. But this doesn't force the user (i.e. caller) of div to handle a non-zero divisor at all, it forces the user of the return value to handle a potential missing value without any context for why it was missing. Propagaging this missing value isn't 'handling' the error at all, since that can only be resolved by ensuring the divisor is non-zero before calling div.

deltaonefour · on Nov 19, 2021

>But this doesn't force the user (i.e. caller) of div to handle a non-zero divisor at all, it forces the user of the return value to handle a potential missing value without any context for why it was missing.

Like I said think in terms of isomorphisms. Exceptions also force the caller to handle the code. There is no difference here.

None is isomorphic to a Null and an exception is isomorphic to a result type. If you want context. Do this:

   type Result[Int] = Int | Exception[String]

   def div(x: int, y: int) -> Result[Int]:
       return x // y if y != 0 else Exception("Division by zero on line ${line}")

The structure of the code handling this type of div is identical to code handling an actual exception. The main difference is type safety. Exceptions compile with a possible runtime error, Result types do not.

Additionally your NonZero[Int] is not fully type safe. You have to think about long reaching consequences and precursors. Here you aren't thinking about the precursor. What generates this NonZero[Int]? Usually at some point you have some type casting function of the form:

   Int -> NonZero[Int]

What then happens when I pass a zero? All you did is propagate the issue to somewhere else.

Math doesn't have the syntax to fully compose two functions with different codomains and domains. So strict math formalism is irrelevant here. You can switch some attributes to isomorphic concepts (like how mapping division by zero to an element in a set called "undefined" is equivalent to the operation actually being undefined) but in the end you have to invent something because math simply doesn't have any elegant formal syntax to cover this codomain and domain mismatch. Therefore strict adherence to the concept that a Int can never be inserted into a div function is unnecessarily pedantic. Literally every other mathematical operation covers all of Z in both domain and codmain except for division.

lkitching · on Nov 19, 2021

> The structure of the code handling this type of div is identical to code handling an actual exception

You would never write an exception handler to handle such a failure from div. The divisor being non-zero is a precondition of calling div in the first place, which is something the caller is responsible for upholding. You shouldn't ever need to write an exception handler to catch precondition violations. Do you also write handlers to 'handle' null dereferences? Representing the partiality in the return type is just pushing the responsibility to some code that can't reasonably do anything.

> What then happens when I pass a zero?

I've already explained this, you obtain a NonZero[Int] from a function

    fromInt : Int -> Optional[NonZero[Int]]

and you can optionally add an unsafe version with type

    Int -> NonZero[Int]

> All you did is propagate the issue to somewhere else

Yes, the check has to be done somewhere since that is the point of encoding the property in the types. But encoding it in the argument type ensures the check is done before div is called which is where it needs to be done.

deltaonefour · on Nov 20, 2021

>You would never write an exception handler to handle such a failure from div. The divisor being non-zero is a precondition of calling div in the first place, which is something the caller is responsible for upholding. You shouldn't ever need to write an exception handler to catch precondition violations. Do you also write handlers to 'handle' null dereferences? Representing the partiality in the return type is just pushing the responsibility to some code that can't reasonably do anything.

This is just your arbitrary preference. There is nothing wrong with going from either perspective. But your exception is completely worse from every quantifiable metric except for your opinionated qualitative metric.

    fromInt : Int -> Optional[NonZero[Int]]

This is functions suffers from the same problem you describe. You're just trying to justify a convention of doing this check before rather than later. Also Your unsafe version is again worse because it will trigger an exception on zero, so I don't see how it helps your argument.

>But encoding it in the argument type ensures the check is done before div is called which is where it needs to be done.

This is the core of your argument and it is highly flawed. There is no "need" for it to be done this way. It is simply your preferred convention.

Your argument loses on both fronts. Exceptions are definitively worse and Encoding non zero type safety into the parameter is not necessarily proven to be better.

lkitching · on Nov 20, 2021

> This is just your arbitrary preference

It's not arbitrary since it's possible to write your function using mine but not vice versa. If you disagree then please implementing the following function without casting:

    def convertDiv(f: (Int, Int) -> Optional[Int]): (Int, NonZero[Int]) -> Int

> You're just trying to justify a convention of doing this check before rather than later

The convention that callers are responsible for upholding the preconditions of the functions they call is well established: https://en.wikipedia.org/wiki/Design_by_contract. You obviously can't fix precondition violations by checking the result after the fact.

> Also Your unsafe version is again worse because it will trigger an exception on zero

That is the point of the unsafe version, yes. Sometimes you will statically know the argument is non-zero e.g. NonZero(3). If you want to avoid an exception then use the safe version.

deltaonefour · on Nov 20, 2021

>It's not arbitrary since it's possible to write your function using mine but not vice versa. If you disagree then please implementing the following function without casting:

First, Why does this even matter? It doesn't. Being able to write something in terms of the other doesn't mean anything.

Second you can't implement the converse without casting EITHER. The Optional[Int] doesn't exist so how do you create it?? You CAST. It's a zero cost implicit type in python and in C++.

>The convention that callers are responsible for upholding the preconditions of the functions they call is well established: https://en.wikipedia.org/wiki/Design_by_contract.

Should I use the fact that Optional is more well established then NonZero to win this argument? Yeah if you want to talk about "Well Established" then Optional is more well established then NonZero or this Design by contract convention that is so unestablished I barely even heard of it.

Additionally even reading about this convention I see no requirement that division by zero must never return an undefined or that zero should never be the divisor. The description reads that these pre/post conditions just need to exist, but they're your choice what you need them to be. These conditions are encoded in the type.

>If you want to avoid an exception then use the safe version.

The safe version suffers from your same problem just moved. Nothing is magically solved by this move other than it fulfilling your arbitrary opinion and convention.

lkitching · on Nov 22, 2021

> First, Why does this even matter?

The reason you can't write my version using yours is that the types are less precise and you can't recover the imprecision in the output type after the fact. The only safe way to obtain an Int from an Optional[Int] is by providing a default value which doesn't exist in this case.

> The Optional[Int] doesn't exist so how do you create it?? You CAST

By casting I mean an unchecked narrowing conversion e.g. of the type Optional[Int] -> Int. There's no casting in my version.

> if you want to talk about "Well Established" then Optional is more well established

This is a false dichotomy, contracts are still used in static languages where you can't or don't want to try represent properties at the type level. You could for example define a function

    lookup: Map -> Key -> Optional[Value]

and still add preconditions that the map and key were non-null. The failure to uphold these represent a different kind of 'failure' than the key not being found so it wouldn't make sense to lift them into the return type.

> The safe version suffers from your same problem just moved

It didn't 'just' move, it moved to the point in the program you actually need to deal with the possibility of a zero divisor i.e. before calling div. Where does the divisor come from in the first place? You seem to be assuming there is necessarily some call to NonZero.fromInt at each call site to div but this is wrong. The non-zeroness of the divisor could be established at some prior point in the program and used in multiple places. In contrast your version has to deal with the possibility of returning None everwhere even if you've already established the property of the divisor beforehand.

deltaonefour · on Nov 24, 2021

>The reason you can't write my version using yours is that the types are less precise and you can't recover the imprecision in the output type after the fact.

Irrelevant to my statement. I said why does it even matter not why can't you do it. The answers are it doesn't matter at all AND you can't do it for EITHER case.

>The only safe way to obtain an Int from an Optional[Int] is by providing a default value which doesn't exist in this case.

No the safe way is through exhaustive type checking via pattern matching. If you're not sure what this is, look it up. Suffice to say it's static safety on all sum types including Optionals prior to execution.

>By casting I mean an unchecked narrowing conversion e.g. of the type Optional[Int] -> Int. There's no casting in my version.

There is 100% casting in your version. 100% percent. There is no narrow conversion here you're just making that shit up. The inverse of what you wrote is THIS:

       def convertDiv(f: (Int, NonZero[Int]) -> Int ): (Int, Int) -> Optional[Int]:

There is no way to create an Optional[Int] WITHOUT a typecast. I'm sorry, but your statement is definitively wrong no need to build some scaffold of strange logic around it and "narrow" the definition of a cast. I get your point though (even though I disagree). However, this does not change the fact that your example is completely wrong from a logical standpoint and completely off base.

>and still add preconditions that the map and key were non-null. The failure to uphold these represent a different kind of 'failure' than the key not being found so it wouldn't make sense to lift them into the return type.

Uh no. You can do Exactly what you did with NonZero[Int] with Key in your example. Imagine a map with RGB colors as keys.

   type KEY = Red | Green | Blue
   type VALUE = ...
   lookup: Map[KEY, VALUE] -> KEY -> VALUE

Like I said it's just your preference here. There is a false dichotomy when it comes to things being more correct when "Well Established" and that false dichotomy isn't coming from me. It's coming from you.

>It didn't 'just' move, it moved to the point in the program you actually need to deal with the possibility of a zero divisor i.e. before calling div. Where does the divisor come from in the first place? You seem to be assuming there is necessarily some call to NonZero.fromInt at each call site to div but this is wrong.

Ok let me reframe this. I completely AM not Assuming NonZero.fromInt at the call point AT all. Once you realize that your assumption is wrong, maybe you should consider the fact that you're NOT understanding me.

>The non-zeroness of the divisor could be established at some prior point in the program and used in multiple places.

The above is 100% what I am assuming. This prior point involves the creation of the type NonZero[Int] which involves: NonZero.fromInt. Every other mathematical operation (+,-,x^y,/,) returns an Int not a NonZero[Int] so this cast must occur. And that is my point. Think about it.

> In contrast your version has to deal with the possibility of returning None everwhere even if you've already established the property of the divisor beforehand.

This is where you're getting hung up. Let's clarify something your NonZero.fromInt is of the type:

   Int -> Optional[NonZero[Int]]

With that out of the way let us continue:

Yeah so my division returns an Optional which could be a None. I can either handle the None immediately or let it propagate all the way to IO and handle it just before it hits this wall. This is a bad thing I get it.

But your NonZero.fromInt Also returns an Optional which could be None. I can either handle the None immediately or let it propagate all the way to IO and handle it just before it hits this wall. This is a bad thing I get it.

Notice how the above two sentences are the same? That is what I mean when I say you're just moving the problem to another place but the problem essentially remains the SAME THING.

As I stated before and I'll repeat it again. The only reason why you prefer NonZero[Int] over Optional[Int] is the same reason why someone would prefer blue over red. There is no logic, rhyme or reason behind it. It is just your style and your personal taste.

lkitching · on Nov 24, 2021

> I said why does it even matter not why can't you do it

I've explained why it matters - the types are more precise in my version and if you start from that you can always throw away the extra precision if desired to get to your version. You can't go in the other direction, so starting from your version makes it impossible to safely recover an Int from the returned type of Optional[Int], even if you've already established the precondtion beforehand.

> There is 100% casting in your version

Creating an Optional[Int] from an Int is a conversion, not a cast. I thought it was obvious from the context but for the avoidance of any doubt, by 'casting' I mean an unsafe narrowing conversion. Optional[Int] is a larger type than Int, so it's trivial to create one from an Int:

    def pure(x: Int): Optional[Int] = Just(x)

you clearly can't safely go in the other direction, whether using pattern matching or otherwise. If you disagree, just complete the following definition:

    def fromOptional(o: Optional[Int]): Int =
        match o with
        | Some(i) => i
        | Nothing => ...

eventually you need to provide a default value for the case of no value.

> Imagine a map with RGB colors as keys.

Your example doesn't make sense, what would you expect (lookup Map.empty Red) to return? The optional return value is used to represent the key being missing in the map. Nonetheless the point I was making is that you wouldn't return Nothing from such a function in the event of a precondition failure e.g.

    def lookup(m: Map[K, V], v: K): Optional[V] =
        if m is None return Nothing
        ...

you would instead throw an exception if the input map is null and force the caller to handle it. The majority of static type systems are not powerful enough to encode arbitrary properties about values, so you have to decide which ones to check dynamically and which statically. Checking preconditions dynamically is reasonable if encodng them in the type system is too cumbersome.

> This prior point involves the creation of the type NonZero[Int] which involves: NonZero.fromInt

No, this is not necessarily the only way to create instances of NonZero. You could have a PosNat subtype with members one: PosNat and succ: PosNat -> PosNat. You could have a non-empty list type with a length member.

> Every other mathematical operation (+,-,x^y,/,) returns an Int not a NonZero[Int]

They don't return Optional[Int] either so I don't see how this is relevant. There's no reason the input has to come from some application of a different operator, it could come from configuration, user input, a property from some other type etc. The question is whether and how to model the constraints in the type. The constraint exists in the argument so it makes sense to constrain the input type, not widen the output.

> Notice how the above two sentences are the same?

Yes, if all you want to do is avoid establishing the property you care about and silently propagate some information-free 'failure' value to the top level, then you can do it either way. But the entire point of encoding properties in the types is to force you to establish them. These statements highlight the difference:

1. I've established the divisor is non-zero, called myDiv, received an Int and continue

2. I've established the divisor is non-zero, discarded that information to call yourDiv, recieved an Optional[Int] which cannot be empty, but which must be propagated. You could immediately unwrap the value but now you're just re-creating the dynamic behaviour of a function (Int, Int) -> Int which you've already rejected.

jcranberry · on Nov 15, 2021

Haskell throws an exception and rust panics on division by zero for ints.

This is just defensive programming.

The None state here is already an invalid state. If you want to use the type system to ensure correctness you would change the second argument to be a type representing multiplicatively invertible integers.

deltaonefour · on Nov 15, 2021

You need to employ pattern matching for exhaustive type checking to be deployed.

For haskell it will give you a warning. For Rust it will actually not compile. The key here is that you MUST use pattern matching.

jcranberry · on Nov 15, 2021

If the compiler can figure out that you are dividing by zero at compile time then you will get those results. That is not a realistic expectation in general.

Having to wrap every division call in a maybe monad or rust result enum is bad from a usability standpoint and potentially unacceptable from a performance standpoint.

Exhaustive pattern matching is great. But not a replacement for other types of error handling.

deltaonefour · on Nov 15, 2021

Performance is a trade off and usability is arguable. There are ML style languages where a Maybe monadic value is the default return value of division due to the division by zero issue.

There are languages that employ this philosophy everywhere. Such languages are actually incapable of having a runtime error outside of edge cases like ffi.

Just so you know, you don't have to wrap every type in an enum. Types and enums share a bijective relationship. A type IS an enum and vice versa. You CAN in fact define an Int in terms of an enum (and an enum in terms of an int, in C). The mechanisms behind an Int and a bool are roughly the same, it's just the cardinality and mappings that are different. Pattern matching implementations usually check types based off of cardinality and do not differentiate between enums and other built in types like floats or ints.

globular-toast · on Nov 15, 2021

This is probably all correct, but man... this just isn't fun at all. This isn't why I got into programming.

deltaonefour · on Nov 15, 2021

I'm sorry you don't find programming fun. I find it fun. And if you don't like this stuff, maybe it's the wrong career choice for you.

hsn915 · on Nov 15, 2021

> it also tends to break down for large projects

You also end up spending a lot of time and resources in the later stages of the project trying to work around the problems: hire more people to work on the code base, spend a lot of time investigating optimization strategies, hire a devops team to turn the program into a distributed system with maybe hundreds of instances running on CloudProvider (TM)

api · on Nov 15, 2021

> turn the program into a distributed system with maybe hundreds of instances running on CloudProvider (TM)

... and give tens to hundreds of thousands a month to cloud providers when you could spend 1/100th if it were written in Go.

hsn915 · on Nov 15, 2021

You could even save the costs by buying your own server. I doubt that you can't buy a very decent server hardware for no more than $10000. Some places pay more than that per month to Amazon.

api · on Nov 15, 2021

But that's not cloud, because cloud!

exyi · on Nov 15, 2021

Not just dynamic types, but all the breaking changes they make in minor versions. Since the program is "compiled" on user's machine, not yours, you don't even control which version is it going to use. I mean, migrating from Scala 2.12 to 2.13 to is bit annoying, but I can do it whenever I have time. Python 3.9 to 3.10 - ASAP, it's a bug suddenly when they decide to release.

exyi · on Nov 15, 2021

I don't like how he makes it a choice between Python and C++/Rust. There is very many languages that are more similar to Python in convenience and yet run reasonably fast (and you can actually optimize some procedures when you need to, because there is a compiler). Go, Julia, C#, F#, Scala, Kotlin, even recent Java... all much faster than Python and much less pain to work with than C++.

And the interoperability is not as awesome as it's painted in the article, it's always more pain to have more languages in a project that need to talk together

raxxorrax · on Nov 15, 2021

You have to look at resources too. You probably find 10 Java developers before you find 1 Scala developer, even if they are related technically. Especially on long term projects you have at least some turnover of people.

hsn915 · on Nov 15, 2021

The fallacy of premature optimization rears its ugly head again.

When Knuth wrote that quote, he meant something very specific with the word optimization: low level micro optimizations. Spending a lot of time trying to get every little ounce of performance from every little CPU instruction.

High level reasonable design decisions are not "optimizations" in this sense at all. You should think of them as non-pessimization.

Choosing Python or another slow language is premature pessimization. You just make your program slower for no reason.

So choosing a language based on how programs written in it perform is simply non-pessimization.

If you are interested in an expansion on this idea, checkout Casey Muratori's lecture about philosophies of optimization: https://www.youtube.com/watch?v=pgoetgxecw8

The article makes claims about software being "fast enough" and not needing further optimization. But what you call "fast enough" is probably 10000 times slower than what it could be.

I'm not joking.

People who program in slow languages have a really skewed perception about performance.

If a page takes 5 seconds to load, they don't see that as a problem.

If a server takes 500ms to respond to a request, they don't see that as a problem.

This is simply unacceptable.

BiteCode_dev · on Nov 15, 2021

I have a half million users a day streaming website coded in Python. It does machine learning, uploading, encoding, dynamic load balancing, live search, comments, votes and most of the code I had to write was in django or rq endpoints.

On first load, not only the page is ready, but the video has started to auto play (including buffering), under 3 seconds. Less than 500ms on a reload.

The whole project has been running for 10 years on 7 servers. Hosting costs under $1K/mo.

And even better, the code base is a mere 11K LoC, so a single dev can manage it. And make a living out of it.

That's because:

- Python is almost never the bottleneck in a website

- Python wraps compiled languages for slow stuff. That's the whole point.

- your archi matters much more than the language you chose. Because good db providers, proper caching and optimized data fetching will be the pareto moves, unless you are google size.

- you will not code most things anyway. You are not going to code the video encoding, you will use ffmpeg. You will not recode a database, you will use an existing one. You will not code a server, you will use nginx or something else. What you will spend time on is creating logic for your clients.

So I think your are extrapolating this rare use case where your POV makes sense, and ignore the 99% of the case were, indeed, a slow language doesn't matter.

In fact, if electron success tells anything, it's that while I do value speed a lot, the market has voted.