Hi, I recognize you as a PyPy developer. Do you think it's fair to write that "no one uses PyPy"?. I feel there are quite many companies using PyPy, mostly for web backend. You don't see many usages in other usages like Linux distros or ad-hoc scripts, even when the C extensions are not used, because 1) In 95% of cases, the performance is good enough, 2) the warmup slowdown dwarfs the overall gains.
Anyway, I think PyPy doesn't get the attention it deserves - the performance gains are fantastic and the vision of the Python ecosystem as Python-only packages with occasional lightweight C libraries integrated with cffi looks very nice to me.
I mean, it's obviously unfair. Even in the scientific community someone uses PyPy. As far as download statistics go, I think between 0.5-1% of downloads from PyPI are for PyPy. This is far from "majority" but far from "noone" as well. For what is worth enough big players use PyPy that the consulting company that I run can stay afloat, so I somehow doubt that's the case :-)
From an outside perspective with no skin in the game (other than using Python profusely and just generally wanting it to be the "best" possible Python it can be) I wouldn't be so quick to call it obviously unfair.
The exact phrasing was:
> PyPy is ten years old at this point, but to a first approximation, no one is using it.
Using your 0.5% - 1.0% PyPI metric as an approximation for how many people are actually using pypy, I think it's reasonable to say that. I personally would definitely have phrased it differently, but if >=99% of the market isn't using PyPy, then it is far, far, far removed from the "mainstream" market. In fairness, the Python community is, on the whole, large enough to give you a reasonably sustainable niche, but it pales in comparison to the market itself.
It's a bit like comparing Facebook to CouchSurfing, to be honest. You know about it, maybe even have some friends who have done it, and hey, it's still 3 million people, but in comparison to Facebook... no one uses it. And that's not a value judgment or anything, it's just a scale of comparison kind of thing.
That being said, I share your frustration about the deep schism between web vs scientific in the Python community, and I think it's really a shame for the community as a whole. Unfortunately I'm not sure that's going to change very soon: the web side of things has a tremendous amount of momentum (and money) invested in "their" architecture, and at the same time, the scientific side of things has far, far less patience for pain points in their language tooling.
Put differently, if your background and job is programming, you're more likely to view "dealing with this programming problem" as actual work, but if your background and job is "I have this data, and I need to analyze it", then "fiddling with this programming problem" is, at best, a frustrating distraction from your actual task. And I think both arenas need to have a better appreciation for the other: as programmers, our tools generally really do suck for everyone, we're just used to it; as data scientists, we are woefully under-aware of how difficult these programming problems can be. Some more unity would be tremendously beneficial for all.
And then on top of it all, there's this whole group of weirdos using Python for stuff like desktop applications (I happen to be in this camp). Good luck finding a packaging and deployment solution there!
PyPy finally got $200K of funding, from Mozilla.[1] They're going to support Python 3.5. PyPy ought to be able to use a version of NumPy written in Python. If someone can pound Python's little tin god into making Python's new type annotations have sane semantics (as in enforcing the typing), then NumPy in Python could come up to NumPy speed.
The amount of C code used with Python declined somewhat with Python 3, because the C interface changed and things had to be reimplemented. There's pymysql, for example, which is a database connector in pure Python. No more need for the C version.
While I still think it a bit unfair, the exact quote is "...but to a first approximation, no one is using it". So I think (assuming fijal's numbers of 0.5-1% are roughly correct) it is not completely unreasonable to say that PyPy adoption is very limited to date, which was the point of what he was saying (although I think it could have been said more diplomatically).
Yes, some people wondered if PyPy is "the future of Python" - but it didn't take over the Python ecosystem. If the initiatives mentioned in the article will yield a 100%-CPython-compatible implementation that is always faster and doesn't have big warmup slowdown, it could be "future of Python".
I've been testing it Gtk stuff each release for a while now, it seems really close to working with the things I need (CPython libs). About 3 years ago, I couldn't have guessed it would come that far + this is as a fan/lurker of their mailing list :)
I'm also surprised that it mostly works with extensions written for CPython. Unfortunately there's a big difference between working in 99.9% of cases vs 100%...
Actual target is not 100%, but CPython interversion agreement, because CPython versions (say 3.4 and 3.5) are not 100% compatible with each other. I think PyPy can reach that target some day.
> the vision of the Python ecosystem as Python-only packages with occasional lightweight C libraries integrated with cffi looks very nice to me.
I have the opposite view. I'd rather have all of the important libraries written in a language with a safer type system than Python's. C's type system has some safety issues, but it's still safer than Python's.
If Python 3's optional type hints could actually be enforced at runtime, I'd be sold on using Python for pretty much everything.
> C's type system has some safety issues, but it's still safer than Python's.
Uhm. An error in Python code code can lead to a runtime crash with an exception or a bad result ...most of the time. An error in C can lead to anything from an impossibly hard to diagnose memory leak to an exploitable buffer overflow... most of the time.
C may be be "statically typed", but I think the number of people capable of writing (and maintaining!) secure C code is incredibly low. Heck, even the openssl devs failed at this at least once.
And the fact that C is "static" and Python is "dynamic" does not make C's type system safe. Heck, even js's type "system" is safer than C's if when you use the word safe you refer to, you know, security.
Python has undefined behavior because it's implementations are written in languages that do. Every implementation of it is written in a language with undefined behavior such as C, or in languages which are themselves implemented in languages with undefined behavior (e.g. Jython, which runs on the JVM, which is written in C and C++). There is no guarantee that an implementation of Python is bug-free, so there is no guarantee that any Python problem is free of undefined behavior.
So, if you write a Python library in C or in Python, you'll have undefined behavior either way. Undefined behavior is not a good criteria for choosing which language to use.
I think you have mixed multiple definitions of "undefined behavior".
A language can say that certain operations are undefined, like reference NULL in C. What of the Python language is undefined? (Some things, like garbage collection, are implementation defined. That is different.)
Even if a language contains undefined behavior, a program written in that language can avoid the undefined operations. What of the Python implementations use or depend on undefined operations in the lower-level language?
An implementation can also be flawed. But that's not "undefined behavior" but non-conformant behavior. "A bug", in the vernacular.
According to your definition, is there any language that doesn't have undefined behavior? After all, even the hardware can have flaws and undefined behavior, so to mix metaphors, it's all a house of cards built upon sand.
> According to your definition, is there any language that doesn't have undefined behavior? After all, even the hardware can have flaws and undefined behavior, so to mix metaphors, it's all a house of cards built upon sand.
That's correct, and it's not pedantic to point that out. It proves that undefined behavior is not a useful criteria for choosing which language to use.
But even if we accept your more restricted definition of what constitutes undefined behavior, we still must accept that any language that is implemented in C could possibly exhibit any of C's undefined behavior. Therefore, writing a Python library in Python instead of C cannot help you avoid C's undefined behavior. It might prevent you from introducing more undefined behavior, but that's not the same as avoiding it entirely -- if the implementation is written in C, then that ship has sailed.
> That's correct, and it's not pedantic to point that out. It proves that undefined behavior is not a useful criteria for choosing which language to use
No, it means that you've warped the definition of "undefined" to the point where it's no longer useful. It's entirely possible to have fully-defined languages where every possible string of symbols is either a valid program with single well-defined behaviour or not a valid program.
And no, writing the implementation in C does not introduce undefined behaviour, although it may require a number of compile and run time checks to ensure that you're never invoking it.
> And no, writing the implementation in C does not introduce undefined behaviour, although it may require a number of compile and run time checks to ensure that you're never invoking it.
It is a perfectly valid program. The outcome is undefined (integer overflow is considered undefined behavior in C by every authoritative source I've ever seen). GCC does not emit any warnings when this program is compiled, even with all warnings turned on. There certainly aren't any runtime checks for it. Any program that adds numbers which are passed into it can run into undefined operations.
Here is another example:
i = i++ + 1;
The result of that operation is undefined.
These examples demonstrate that undefined behavior is not something you "invoke" in a special way -- it's often the result of a mistake. It is not restricted to certain operations -- these examples use simple addition! The compiler and runtime checks can't help you avoid it in many cases -- you sometimes won't even get a compiler warning.
Here is an example of a bug in the C implementation of Python that caused undefined behavior because of integer handling: https://bugs.python.org/issue23999
It's basically impossible to avoid. Even if your code is perfect, the compiler might optimize it into something that might contain undefined behavior. I suppose that if you never did any math, or anything with strings, or any pointer dereferences of any kind, or any casting, or any recursion, and you made sure the compiler didn't try to optimize anything, you could end up with a C program that could not have undefined behavior. But it's clearly impossible to implement Python without doing all of those things.
> Here is an example of a bug in the C implementation of Python that caused undefined behavior because of integer handling: https://bugs.python.org/issue23999
The comments in that bug report suggest it was a false positive in Coverity.
In any case, the Python language defines how left and right shift are supposed to work. https://docs.python.org/3/reference/expressions.html?highlig... . What you pointed to, if it were a true positive, would be an example of where the implementation didn't comply with the specification.
It wouldn't be an example of undefined behavior as given at https://en.wikipedia.org/wiki/Undefined_behavior : "undefined behavior (UB) is the result of executing computer code that does not have a prescribed behavior by the language specification" because Python prescribes that behavior.
> It wouldn't be an example of undefined behavior as given at https://en.wikipedia.org/wiki/Undefined_behavior : "undefined behavior (UB) is the result of executing computer code that does not have a prescribed behavior by the language specification" because Python prescribes that behavior.
But the C spec doesn't prescribe any behavior in this case, and the code that is running in this case is C, not Python! It was written in C, it was compiled by a C compiler -- it's C! Whether or not the behavior is undefined for anything a C program does is determined by the C spec only. The Python spec is not relevant. Any C program that overflows an int has done something undefined, whether or not that C program happens to be a Python implementation.
The Python specification defines what the implementation is supposed to do. It happens that the implementation doesn't comply with the specification. That doesn't mean the Python specification is undefined, it means the implementation is in error.
Remember too that the actual implementation is a binary. The C compiler converted the C code into that binary, but in theory it could have been generated manually, as a byte-for-byte equivalent, with C completely out of the picture.
Would you still say it's "undefined behavior" if there were no C compiler? How do you tell the difference in the binaries?
By that logic, you couldn't say anything is ever undefined when it happens in binary machine code, or what spec should apply -- you couldn't say it is C or that it is Python. That level of reductionism doesn't get you anywhere.
Do you not realize that that reductionism is exactly why I don't like your definition of "undefined behavior"?
You are using a non-standard definition of "undefined behavior". In common use, "undefined behavior" is only meaningful relative to a language specification, not an implementation.
Why do you think you are using the common definition when you include implementation bugs as part of UB?
Case 1: A C program overflows an int (which is listed as an undefined behavior in the C spec). Under your definition, and pretty much any other, that program has executed code that results in undefined behavior.
Case 2: A C program overflows an int. That C program happens to be the Python interpreter. Under your definition, based on your responses to previous comments, that did not result in undefined behavior, but was an implementation bug instead.
There is no difference between case 1 and case 2. They are both C programs, so the spec for C determines what is undefined behavior for them. They both did the same thing. Either both are executing code that results in undefined behavior, or neither is.
Case 2 is also an example of an implementation bug if it affects the result of the Python code running in the interpreter.
There is no contradiction. There are two specifications, the C language specification and the Python language specification.
Programs written in Python follow the Python language specification.
When run in CPython, the implementation follows the C language specification.
If the implementation uses C UB, which causes it to be out of compliance with the Python specification, then it is both undefined behavior for C and a failure to follow defined behavior for Python.
It is not undefined behavior for Python.
I have said multiple times that I don't like how your non-standard definition mixed the two together. It is no longer interesting to come up with new ways to restate my statement.
> If the implementation uses C UB, which causes it to be out of compliance with the Python specification, then it is both undefined behavior for C and a failure to follow defined behavior for Python.
That's true, but it's not what you said. You said "the C specification is irrelevant" when I brought up bugs in the C code in CPython, and you said they were implementation bugs but not UB, because the Python spec doesn't define them as UB -- even though the code we were talking about was C! In other words, case 2 in my previous comment.
Now you say "When run in CPython, the implementation follows the C language specification," and "If the implementation uses C UB, which causes it to be out of compliance with the Python specification, then it is both undefined behavior for C and a failure to follow defined behavior for Python."
Those arguments contradict each other. Which one do you believe?
> Even if your code is perfect, the compiler might optimize it into something that might contain undefined behavior.
This is false. If your code is perfect, the compiler shouldn't do such optimization. Of course compilers have bugs, but that's compiler bugs not language standard problem. (Not to speak of bug free compilers like CompCert.)
I now realize I did not understand your original comment. Thank you for the clarification. I don't agree with how you use "undefined behavior", but it's appropriate for the OP's context.
I don't think my definition is different from the common one. I agree that undefined behavior is behavior that the spec for the language says is undefined.
Pretty much every language has some undefined behavior. That undefined behavior can be invoked intentionally, or because of bugs.
C has a lot of undefined behaviors. Even simple addition can result in an undefined behavior because int overflow is undefined. Any C program that takes numbers as input (via the shell, or FFI, or whatever) and adds them together can exhibit undefined behavior. Null pointer dereferences are also undefined, and pretty much any C program of reasonable complexity can encounter those, in the form of bugs.
A bug in hardware can also cause these undefined behaviors to be invoked -- for example, a friend of mine who does embedded programming had some null pointer dereferences or something like that because the hardware did not set a value when he told it to set the value. Null pointer dereferences are undefined, and if they happen because of hardware bugs, that is a case of hardware causing undefined behaviors to happen. It would not be reasonable to say they were not undefined behavior simply because they were caused by hardware instead of programmer error. The fact that it happened at all is what matters, not what caused it.
Even if a language does not have any undefined behaviors whatsoever, that's not the end of it. If a program in that language interacts with other programs that do have undefined behavior, such as C programs, it could potentially trigger or be affected by undefined behaviors in those other programs. If that happens, it's not reasonable to say that it was not undefined behavior simply because it happened in a piece of code in a different language. (So, if the result of running a Python program is affected by undefined behavior in the C implementation of Python, that program has undefined behavior -- even if the Python code does not. The undefined behavior is C's, not Python's -- but it happens because of running the Python code, which caused the C code to run.)
In other words, any invocation of code that causes undefined behavior to happen -- whether it is intentional, or caused by a software bug, or even caused by a hardware bug -- counts.
Where I disagree with you is your inclusion of language implementation errors as a type of undefined behavior.
Those are compliance errors. That is, if the spec allows it (as "undefined behavior") then it's not a bug. If the spec doesn't allow it, then there is a defined behavior and it is a bug.
For example, a hardware bug is one that does not comply with the specification. Either the spec must be changed (perhaps allowing the existing behavior), or the bug fixed.
Are you saying that when a C program, which happens to be an interpreter for another language, does something that results in undefined behavior according to the C spec, it doesn't count as undefined behavior?
I suspect that is where we differ. It seems to me that if a C program does something that results in undefined behavior according to the C spec, it is undefined behavior period, regardless of any other factors, because the spec says it's undefined and that's all that matters.
I'm saying that when the Python language specification prescribes a behavior, and the Python implementation in C has a different behavior because it does something which the C language specification considers undefined, then the Python implementation in C does not comply with the Python language specification.
I'm saying that it's incorrect to say that non-compliant behavior with respect to the Python specification is the same thing as undefined behavior. It makes no sense to say something is undefined when the specification defines what is supposed to happen.
Reports that say that something isn't defined are always interesting to me, because as we define, there are defined undefines; there are things we define we define. We also define there are defined undefines; that is to say we define that there are some things we do not define. But there are also undefined undefines – the ones we don't define we don't define. And if one looks throughout the history of our language and other free languages, it is the latter category that tend to be the non-conformant ones.
It's entirely possible that executing a Python program could cause C code in the Python interpreter, or C code called via FFI, to do something that the C spec defines as undefined behavior. That could affect the result of the Python program. If that happened, it would not be reasonable to say that the Python program did not have undefined behavior -- even if nothing in the Python code was undefined according to the Python spec.
Anyway, I think PyPy doesn't get the attention it deserves - the performance gains are fantastic and the vision of the Python ecosystem as Python-only packages with occasional lightweight C libraries integrated with cffi looks very nice to me.