I really like the idea of having one VM for many languages, but having just watched the Google IO video on V8, I think there might be some real performance advantages to specialized engines.
For example, they replaced the regular expression engine that Webkit was using with one that was specially designed to work with Javascript's regex syntax. Maybe you can get similar performance with a generalized engine, but I'm no longer of the mind that one VM for all languages will be the end-all-be-all solution.
I really like the idea of having one VM for many languages, but having just watched the Google IO video on V8, I think there might be some real performance advantages to specialized engines.
An specialized VM (both for the source language and the deployment architecture) would leave anything generic in the dust. Languages do not have the same usage; the truly unique ones (i.e. worth learning) encourge different programming styles. The G-Machine is as different from the Warren Machine is as different from a SECD machine and all are different from P-CODE machine or an SKI graph reduction machine.
Smilarly, a custom made application DSP would rape any other binary for a stock hardware platform of the same speed (ignoring all costs, of course.)
If you wrote a specialized VM's for Python, Perl, Smalltalk, Ruby, and JavaScript on the x86 would they have any components that were largely identical?
I would postulate that there are features of a VM that are common, like garbage collection.
It would be worthwhile to work out the features common to many VM implementations and put them in hardware. Until that happens, sharing some of a VM implementation seems useful.
I'd be willing to bet that most general purpose languages that are in mainstream use could easily share the majority of their compiler/interpreter code. The main differences are in which combination of features are included and the syntax that you use.
Why put them in hardware? That sounds like an awful lot of extra silicon for every single architecture. Wouldn't it be better to have a higher level software abstraction like Parrot that you just build everything else on top of?
The rule of thumb for instruction set design is that an instruction which is worth 1% in overall performance is worth considering. (That's how one game processor got a 2x2 matrix multiply instruction.)
It's unclear why VM design would be significantly different. In other words, if a specialized regex engine provides a significant performance boost, why not add it? Applications that can use it get faster, applications that can't stay the same.
Of course, one often discovers that redesigning the previous solution so it does the right thing for the special case is even better.
Couldn't there be a way to extend and/or override portions of Parrot specifically for one's language-of-choice? This way, lots of communities can work together on relevant and mutually beneficial places and drop-in their custom code where it suits their language.
You can. You have two different ways of doing it: the first is to write your own low level opcodes, the other, and I think more powerful, is through PMCs. A PMC (Parrot Magic Cookie) is a basic type of your language (e.g. an Integer, a Char, etc.) with some operations associated to it that you can override. Operations include assignment, type conversion, invocation, your own methods and many more. After writing a PMC you can then "map" it to a core type. As an example, if you write a "function" PMC and map it to Parrot's "function" PMC every time the VM needs to create/use a function, it will use the one you defined instead of the default one. This gives a LOT of flexibility.
Being the altruistic guy that I am, I read the article for you. :-) (Not to mention that I was wondering my self.)
My take on LLVM is that it is a low level interpreter, near to the actual hardware. Parrot probably is itself, but the big bonus with Parrot is that it inherits a huge library from Perl and CPAN. That's the advantage for somebody developing a new language: an already existing massive library. Like the advantage the Java VM has with all the class libraries. This advantage is not so great if you are developing a language where the library violates the constraints of the language, e.g. immutable objects.
I surmise that Google went with LLVM for Python because Python already had that massive library and didnt need Perl's. Both Parrot and LLVM are register based interpreters which give a nice speed improvement over stack based interpreters, like CPython and JVM. The registers are an abstraction and dont have anything to do with the available machine registers. Even compilers targeting real hardware use abstract registers internally.
A minor misstatement in the article. Writing a parser is not hard. There are a huge number of tools out there. Parrot's is just one more. And if your language is simple, hand coding a recursive descent parser isn't all that difficult. It just isn't worth the effort.
The Perl folk are being very bold, IMO. With Parrot and Rakudo, they're saying that you can write code in Perl 6 and utilize libs in other languages (including the CPAN in Perl 5), or you write code in other languages and still have access to all those libraries.
That is, they must be quite confident that Perl 6 is a winner, and that most users won't just write, say, Python, run it on Parrot, and make use of the CPAN from there.
(Please tell me if I'm totally missing the point, but my understanding is that any Parrot-hosted lang can access any other Parrot-hosted lang's libs.)
Perl (especially Perl 6, if/when it reaches completion) and Python fill very different temperamental niches. They have profoundly different takes on programming, despite their similarities in terms of both being high-level, interpreted languages. So it seems likely that both will be around for a good long while.
If many languages start using Parrot as a VM, it will moderate the current "rich get richer" dynamic of programming language competition. A language leading in the module race will no longer have such a huge advantage. This will help Perl 6 immensely, of course, since it will start out with very few Perl 6 modules. They'll need to get the Perl 5 CPAN modules working on Parrot to compensate, and it is a happy side-effect that this will make these same modules usable by any other languages running on Parrot.
Also, a point the article doesn't mention is that Perl 6 itself is designed to be a mutable language. Going one step beyond the article, why implement a whole new language at all, just to add/try out a new concept? It may be simpler to instead modify the Perl 6 parser (written in Perl 6 of course) to add your new concept to the syntax and semantics of Perl 6.
I love the idea. I just hope there's actually a ready-for-production implementation of Perl 6 running on top of Parrot at some point.
I just wanted to add some clarification regarding LLVM: it is not, at all, an interpreter. Rather, it is a compiler back-end in the sense that it translates from an IR (intermediate representation) to machine code. The way that it's written makes it useful both in static compilation and JIT (just in time) compilation contexts. Also noteworthy: the IR has both a human-readable and compact binary persistent representations, so it's conceivable that one could use the latter (LLVM BC, as in "bitcode") to ship architecture-neutral binaries to end users that get compiled down to optimized machine code at installation time, or launch time, or just-in-time while the executable is running. LLVM is many things, but it is not an interpreter at any level.
For example, they replaced the regular expression engine that Webkit was using with one that was specially designed to work with Javascript's regex syntax. Maybe you can get similar performance with a generalized engine, but I'm no longer of the mind that one VM for all languages will be the end-all-be-all solution.
http://code.google.com/events/io/sessions/V8BuildingHighPerf...