Somewhat off-topic. I recently got interested in Java and its ecosystem because of Clojure. However, what bugs me is the startup time for Java. Since this is mature technology, I cannot believe it takes so long for an executable to start. One guess is that this is because of the garbage collector inside the VM, but that doesn't make much sense because NodeJS also uses a garbage collector and it starts instantly (and it even has to parse & compile the code upon startup, which the Java VM doesn't even have to do). So... I'm puzzled. Any comments? This is not meant as a trolling question.
It doesn't make any sense to blame the garbage collector. During startup you're loading classes, allocating structures, and verifying byte code. There aren't things that involve garbage collection (allocating with a tracing GC is generally by bump-pointer so is very efficient).
I work on a large Java project (JRuby) and for us the largest component of time spent is on loading classes. Also, startup code is likely to run in the interpreter rather than being JIT compiled, which is also slower.
The solution we're looking at is ahead of time compilation. This avoids class loading and interpretation, and it does seem to solve the problem (we're back to ms startup time).
In V8 (Node.js) the majority of the core language and library features are effectively already AOT compiled as they're implemented in C++. In the OpenJDK (and most JVMs) a lot of this is implemented in Java, so that has to be loaded and interpreted rather than executed natively.
Actually, as I understand it, in V8 much of the core library is implemented in Javascript, compiled at build time, and then dumped into a library which can be linked in directly.
That is, a big chunk of it is explicitly AOT compiled JS, a bit like a Smalltalk image.
I'm talking about a research project called SubstrateVM, which achieves 14 ms hello-world time (so including JVM startup, core library loading, everything, measured using the shell's time command) for Ruby. It's not open source at the moment.
It does a closed-world analysis on all your classes and compiles to native code. The JVM it uses is also written in Java, so is also compiled to native code. It uses the new Graal compiler for compilation at runtime (and actually also for the AOT part) so it also has high performance when running.
There also are things like nailgun which simply prestart a JVM and then delegate to the loaded JVM when you invoke some command. It may be a bit of a workaround, but if your use-case is to frequently start extremely shortlived programs that might be worth it.
I'd be interested to hear a response about what OP is doing specifically, but yes... the term "AOT" generally means compiling down to a native executable.
Excelsior JET has been around forever, and works quite well, although it's very expensive. There was some discussion at the most recent Java One conference which suggested that Oracle will bring AOT to the JDK soon (probably prompted by the fact that Microsoft is doing this on the .NET side). However, there's as of yet no indication of timeline... or whether that will be in the core open-source JDK, or a paid enterprise feature.
Regardless, there is a lot that you can do, short of full-fledged AOT, to reduce the amount of work a JVM has to do at startup. Historically, some of the worst offenders have been libraries and frameworks that make heavy use of reflection. In recent years, alternatives have emerged that do more work at compile-time rather than relying on reflection at run-time.
For dependency injection, you might look at a compile-time solution like Dagger (http://square.github.io/dagger/), rather than traditional reflection-based packages like Spring or Guice. For logging, you might want to use SLF4J (http://www.slf4j.org/) rather than Apache Commons Logging. Use a database schema management solution like Flyway (https://flywaydb.org/) rather than letting an ORM screw around with your database schema every time on startup. Etc.
However, I really just don't understand the "Java is slow" gripe in the original comment. At my company, our Spring Boot-based services startup in around 8-12 seconds depending on the service. These are fairly complex application components too, no trivial "hello world" stuff. Sure, in my career I've seen some legacy apps that takes minutes to startup... but that's because a ton of cruft has been added over time. If you architect your application to initialize and communicate with a ton of external dependencies at startup every time, then startup is going to be slow no matter what language you're using.
It's just hard to take most of these gripes seriously. Java is well suited for server-side business application development, and systems integration. If you're using it in THAT context, and adhering to smart modern design principles, then it blows every other option away. If a Spring Boot or Dropwizard app is too slow to startup, then I'm sorry but you just don't know what you're doing.
If you're using Java for video game development, or some kind of desktop app, or embedded development on your Raspberry Pi Zero, then well... you're going to have a bad time, because Java is not very good for that even if you DO know what you're doing. Use something else in those contexts, for goodness sake.
I have high hopes for AOT compilation of popular languages. Many prior attempts have tried and failed to gain feature parity. If you can remove JITing, you can enforce static code integrity at the OS level. JIT is definitely useful but if you can accomplish your goals with static code, that is all the better for security.
AOT compilation for languages like Java is full of interesting challenges. I worked on gcj back in the day, and we had a super interesting Java/C++ interoperability scheme called CNI: https://gcc.gnu.org/java/papers/cni/t1.html
> In terms of languages features, Java is mostly a subset of C++.
Not sure I'd agree with that. It's obviously incorrect in a formal sense, and in an informal sense you could say that feature sets of so many languages overlapped somewhat so that the observation is useless.
Yes, unfortunately Clojure startup performance problems are home-grown and cannot be blamed on the JVM. On the other hand, once a Clojure repl is running, you can keep adding things without recompiling or even losing state, which can be a huge performance advantage, assuming that OP's troubles are related to dev xp.
Well talking about dev xp, I like to follow the Unix tradition, where one can make many tools, and run them from the commandline, combine them in scripts, etc. But obviously this only works if startup is on the order of milliseconds, not seconds. NodeJS lets us do this, so I was wondering why not Clojure/Java.
Anyway, I'm still curious how much faster running an AOT compiler on Clojure would be.
You could use a long-running JVM for these tools if there's a compelling use-case? Not ideal but if it lets you write tools in the language of your choice it may be worth it.
I wonder a bit if Clojure isn't atypical though? I mean my compiler starts up fast too, then takes a fairly long time to grind away. However this gets done once.
You may want to check if you're using the server JVM vs. the client JVM also... one of the differences is the level of optimization that the VM tries to do. The server JVM does deeper analysis during classloading, which increases startup to,e. If you have very short running processes (which I'm assuming you do if you're concerned with startup time) the client JVM may be more appropriate.
Also in general, the vanilla JVM startup is pretty quick... the slowness you're experiencing is probably "userspace" code.
Maybe to do with garbage collection. JVM uses generational collectors that depend on the weak generational hypothesis. This hypothesis doesn't hold during start-up as many long lived objects are created in a short period of time.
sure the garbage collector make a difference but the big difference imho is how the VM interpret the bytecode.
In the JVM, the long startup time is because the bytecode is fully loaded, verified, etc. then interpreted
If you compare with other VM, like the AVM2 (ActionScript Virtual Machine 2), there the bytecode is considered as a stream, even if your program is many hundred of MB, the interpreter kick in much earlier.
It is explained in a presentation at Standord University by Rick Reitmaier from Adobe Systems
Wether you run the AVM2 from a shell script, a standalone executable or even as CGI under Apache, the avmshell which run, load, verify, interpret the bytecode, startup in matter of milliseconds.
I haven't tried using it recently, but consider passing the -client flag rather than the -server flag at JVM boot and see if it's any better? The -server flag is great for production because it does a ton more pre-optimization at bootup.
The thing that put me off from Clojure was the amount of memory it used. It's ok by default, but when I wanted to use emacs+cider, it consumes about a gigabyte of the ram. Having to work on a 2GB machine, it becomes painful to use.
I took a look at the LLVM preset and it looked like you had to pass object handles to static functions. I'd prefer to have an actual, say, LLVM::Module equivalent object in Java that you can call member functions on. Are all the presets like this?
No, that's because in the specific case of LLVM only the C API is currently mapped. We'd have to work a little bit harder for the C++ API, but there is interest in doing so, if only to use it as a better parser for JavaCPP itself:
Improve Parser: Use the Clang API
https://github.com/bytedeco/javacpp/issues/51
Any recommendations for going the other way? I'm interested in writing Android applications entirely in C++, but obviously a lot of interaction with Java is required. It would be great if there was a way to use JNI well enough to never be tempted to write little bits of Java here and there.
For those of you looking at something like this, remember that JNI (no matter how you wrap it) is still incredibly slow when trying to pass larger data structures (>1KB).
On modern JVM we can use direct NIO buffers and let Java access native memory directly at full speed. There is (almost) no overhead at all anymore, so if we do it that way, it's pretty fast actually.
Looks very useful. It would help to bridge to many C++ packages. I wonder what the deployment option is. Can it bundle multiple platform libraries (e.g. Windows and Linux X86) in one package?