JavaCPP

amelius · on Feb 13, 2016

Somewhat off-topic. I recently got interested in Java and its ecosystem because of Clojure. However, what bugs me is the startup time for Java. Since this is mature technology, I cannot believe it takes so long for an executable to start. One guess is that this is because of the garbage collector inside the VM, but that doesn't make much sense because NodeJS also uses a garbage collector and it starts instantly (and it even has to parse & compile the code upon startup, which the Java VM doesn't even have to do). So... I'm puzzled. Any comments? This is not meant as a trolling question.

chrisseaton · on Feb 13, 2016

It doesn't make any sense to blame the garbage collector. During startup you're loading classes, allocating structures, and verifying byte code. There aren't things that involve garbage collection (allocating with a tracing GC is generally by bump-pointer so is very efficient).

I work on a large Java project (JRuby) and for us the largest component of time spent is on loading classes. Also, startup code is likely to run in the interpreter rather than being JIT compiled, which is also slower.

The solution we're looking at is ahead of time compilation. This avoids class loading and interpretation, and it does seem to solve the problem (we're back to ms startup time).

In V8 (Node.js) the majority of the core language and library features are effectively already AOT compiled as they're implemented in C++. In the OpenJDK (and most JVMs) a lot of this is implemented in Java, so that has to be loaded and interpreted rather than executed natively.

munificent · on Feb 13, 2016

> During startup you're loading classes, allocating structures, and verifying byte code.

Also executing static code. A surprisingly large amount of code runs before you get to main().

agumonkey · on Feb 13, 2016

I did a stupid bench the other day. An empty class with an empty main was regularly slower than one with a println("..."); in it. (arch, openjdk8)

jallmann · on Feb 13, 2016

Significantly slower, or is a single println simply under the noise floor?

agumonkey · on Feb 13, 2016

Erf, it was enough for me to be surprised, like 20%. I trashed the 'code' and I can't reproduce it ~_~;

http://lpaste.net/152375

barrkel · on Feb 13, 2016

Actually, as I understand it, in V8 much of the core library is implemented in Javascript, compiled at build time, and then dumped into a library which can be linked in directly.

That is, a big chunk of it is explicitly AOT compiled JS, a bit like a Smalltalk image.

( see https://news.ycombinator.com/item?id=294618 )

amelius · on Feb 13, 2016

By AOT, do you mean something like this: http://www.excelsiorjet.com/

If you look at the "Benefits" tab, you will see that startup will still take on the order of 1+ seconds, which is quite long IMO.

Also, would this work with Clojure? And are there open-source AOT compilers which work with Clojure?

chrisseaton · on Feb 13, 2016

I'm talking about a research project called SubstrateVM, which achieves 14 ms hello-world time (so including JVM startup, core library loading, everything, measured using the shell's time command) for Ruby. It's not open source at the moment.

It does a closed-world analysis on all your classes and compiles to native code. The JVM it uses is also written in Java, so is also compiled to native code. It uses the new Graal compiler for compilation at runtime (and actually also for the AOT part) so it also has high performance when running.

Details here http://medianetwork.oracle.com/video/player/2623645003001 or here http://chrisseaton.com/jvmls13-one-vm/jvmls13-one-vm.pdf (slide 19), but that's from 2013.

the8472 · on Feb 13, 2016

Afaik IBM's JVM supports class data sharing, which effectively provides AOT behavior once they have been JITed.

For oracle's VM they're working on a commercial feature https://www.youtube.com/watch?v=Xybzyv8qbOc

There also are things like nailgun which simply prestart a JVM and then delegate to the loaded JVM when you invoke some command. It may be a bit of a workaround, but if your use-case is to frequently start extremely shortlived programs that might be worth it.

Alphasite_ · on Feb 13, 2016

Doesn't Dalvik use a machinist similar to railgun to achieve its performance?

bitmapbrother · on Feb 14, 2016

Dalvik is no more. ART uses AOT.

StevePerkins · on Feb 13, 2016

I'd be interested to hear a response about what OP is doing specifically, but yes... the term "AOT" generally means compiling down to a native executable.

Excelsior JET has been around forever, and works quite well, although it's very expensive. There was some discussion at the most recent Java One conference which suggested that Oracle will bring AOT to the JDK soon (probably prompted by the fact that Microsoft is doing this on the .NET side). However, there's as of yet no indication of timeline... or whether that will be in the core open-source JDK, or a paid enterprise feature.

Regardless, there is a lot that you can do, short of full-fledged AOT, to reduce the amount of work a JVM has to do at startup. Historically, some of the worst offenders have been libraries and frameworks that make heavy use of reflection. In recent years, alternatives have emerged that do more work at compile-time rather than relying on reflection at run-time.

For dependency injection, you might look at a compile-time solution like Dagger (http://square.github.io/dagger/), rather than traditional reflection-based packages like Spring or Guice. For logging, you might want to use SLF4J (http://www.slf4j.org/) rather than Apache Commons Logging. Use a database schema management solution like Flyway (https://flywaydb.org/) rather than letting an ORM screw around with your database schema every time on startup. Etc.

However, I really just don't understand the "Java is slow" gripe in the original comment. At my company, our Spring Boot-based services startup in around 8-12 seconds depending on the service. These are fairly complex application components too, no trivial "hello world" stuff. Sure, in my career I've seen some legacy apps that takes minutes to startup... but that's because a ton of cruft has been added over time. If you architect your application to initialize and communicate with a ton of external dependencies at startup every time, then startup is going to be slow no matter what language you're using.

It's just hard to take most of these gripes seriously. Java is well suited for server-side business application development, and systems integration. If you're using it in THAT context, and adhering to smart modern design principles, then it blows every other option away. If a Spring Boot or Dropwizard app is too slow to startup, then I'm sorry but you just don't know what you're doing.

If you're using Java for video game development, or some kind of desktop app, or embedded development on your Raspberry Pi Zero, then well... you're going to have a bad time, because Java is not very good for that even if you DO know what you're doing. Use something else in those contexts, for goodness sake.

needusername · on Feb 14, 2016

Aye, we boot a full profile Java EE server with 80 EJBs and the WebSphere MQ RAR in 6 to 7 seconds.

jevinskie · on Feb 13, 2016

I have high hopes for AOT compilation of popular languages. Many prior attempts have tried and failed to gain feature parity. If you can remove JITing, you can enforce static code integrity at the OS level. JIT is definitely useful but if you can accomplish your goals with static code, that is all the better for security.

atgreen · on Feb 14, 2016

AOT compilation for languages like Java is full of interesting challenges. I worked on gcj back in the day, and we had a super interesting Java/C++ interoperability scheme called CNI: https://gcc.gnu.org/java/papers/cni/t1.html

chrisseaton · on Feb 14, 2016

> In terms of languages features, Java is mostly a subset of C++.

Not sure I'd agree with that. It's obviously incorrect in a formal sense, and in an informal sense you could say that feature sets of so many languages overlapped somewhat so that the observation is useless.

hyperpallium · on Feb 14, 2016

One can also reuse the JVM, so the classes are already loaded. This also takes advantage of JIT, so the code runs faster each time.

In effect, it's using it like a server, for which the JVM is optimized.

BTW Why didn't Sun just serialize the whole image, retaining all those optimizations?

htcw · on Feb 13, 2016

Briefly, the JVM starts very quickly. Most of the time is spent loading the clojure.core namespace. See http://blog.ndk.io/jvm-slow-startup.html

loevborg · on Feb 13, 2016

Yes, unfortunately Clojure startup performance problems are home-grown and cannot be blamed on the JVM. On the other hand, once a Clojure repl is running, you can keep adding things without recompiling or even losing state, which can be a huge performance advantage, assuming that OP's troubles are related to dev xp.

amelius · on Feb 13, 2016

Well talking about dev xp, I like to follow the Unix tradition, where one can make many tools, and run them from the commandline, combine them in scripts, etc. But obviously this only works if startup is on the order of milliseconds, not seconds. NodeJS lets us do this, so I was wondering why not Clojure/Java.

Anyway, I'm still curious how much faster running an AOT compiler on Clojure would be.

oh_sigh · on Feb 14, 2016

You could use a long-running JVM for these tools if there's a compelling use-case? Not ideal but if it lets you write tools in the language of your choice it may be worth it.

djhworld · on Feb 13, 2016

Most 'script' like tools written in Java will start very quickly.

Clojure isn't really designed for this use case.

Hupriene · on Feb 13, 2016

Alex Miller is collecting information on impact of clojure's slow startup time. For that and some good discussion of the causes of clojure's slow startup time see: https://groups.google.com/forum/#!msg/clojure/MG8UTcgFhYc/lB...

pjmlp · on Feb 13, 2016

> However, what bugs me is the startup time for Java.

Which is just a few ms for pure Java.

If you mean it in association with Clojure, blame Clojure initialization code, not Java itself.

RhodesianHunter · on Feb 13, 2016

This appears to be more an artifact of Clojure than Java.

Gibbon1 · on Feb 13, 2016

I wonder a bit if Clojure isn't atypical though? I mean my compiler starts up fast too, then takes a fairly long time to grind away. However this gets done once.

bcg1 · on Feb 13, 2016

You may want to check if you're using the server JVM vs. the client JVM also... one of the differences is the level of optimization that the VM tries to do. The server JVM does deeper analysis during classloading, which increases startup to,e. If you have very short running processes (which I'm assuming you do if you're concerned with startup time) the client JVM may be more appropriate.

Also in general, the vanilla JVM startup is pretty quick... the slowness you're experiencing is probably "userspace" code.

sorokod · on Feb 13, 2016

Maybe to do with garbage collection. JVM uses generational collectors that depend on the weak generational hypothesis. This hypothesis doesn't hold during start-up as many long lived objects are created in a short period of time.

zwetan · on Feb 13, 2016

quite an interesting subject: startup time :)

sure the garbage collector make a difference but the big difference imho is how the VM interpret the bytecode.

In the JVM, the long startup time is because the bytecode is fully loaded, verified, etc. then interpreted

If you compare with other VM, like the AVM2 (ActionScript Virtual Machine 2), there the bytecode is considered as a stream, even if your program is many hundred of MB, the interpreter kick in much earlier.

It is explained in a presentation at Standord University by Rick Reitmaier from Adobe Systems

see at 31:35, https://youtu.be/BjYRzwDEIyw?t=1894 , he talk about startup time.

In my own experience working with AVM2 on my OSS project Redtamarin https://github.com/Corsaair/redtamarin , this startup time was key.

Wether you run the AVM2 from a shell script, a standalone executable or even as CGI under Apache, the avmshell which run, load, verify, interpret the bytecode, startup in matter of milliseconds.

But it is a trade, you could say

JVM: slow startup time, faster execution

AVM2: fast startup time, slower execution

voidfunc · on Feb 13, 2016

I haven't tried using it recently, but consider passing the -client flag rather than the -server flag at JVM boot and see if it's any better? The -server flag is great for production because it does a ton more pre-optimization at bootup.

_pfxa · on Feb 13, 2016

The thing that put me off from Clojure was the amount of memory it used. It's ok by default, but when I wanted to use emacs+cider, it consumes about a gigabyte of the ram. Having to work on a 2GB machine, it becomes painful to use.

agumonkey · on Feb 13, 2016

It's a bit like debian vs arch. Apt is slower, because the packaging system does more. Maybe you only need pacman 'semantics' though.

agibsonccc · on Feb 13, 2016

JavaCPP user here. Amazing framework.

The ideas behind it are really cool (auto generate JNI headers with annotations).

Ensure you checkout the presets: https://github.com/bytedeco/javacpp-presets

Another awesome thing built with it:

https://github.com/WPIRoboticsProjects/GRIP

jevinskie · on Feb 13, 2016

I took a look at the LLVM preset and it looked like you had to pass object handles to static functions. I'd prefer to have an actual, say, LLVM::Module equivalent object in Java that you can call member functions on. Are all the presets like this?

bytedeco · on Feb 14, 2016

No, that's because in the specific case of LLVM only the C API is currently mapped. We'd have to work a little bit harder for the C++ API, but there is interest in doing so, if only to use it as a better parser for JavaCPP itself: Improve Parser: Use the Clang API https://github.com/bytedeco/javacpp/issues/51

agibsonccc · on Feb 13, 2016

I would bring it up with the project maintainer. He's based in Japan(time zone) but I flagged this thread for him for when Japan is awake.

needusername · on Feb 14, 2016

Are you seeing reference processing showing up in you GC logs due to it's reliance on PhantomReferences?

corysama · on Feb 13, 2016

Any recommendations for going the other way? I'm interested in writing Android applications entirely in C++, but obviously a lot of interaction with Java is required. It would be great if there was a way to use JNI well enough to never be tempted to write little bits of Java here and there.

bytedeco · on Feb 14, 2016

I recommend Jace for that: https://bitbucket.org/cowwoc/jace/wiki/Home

andrewguenther · on Feb 13, 2016

For those of you looking at something like this, remember that JNI (no matter how you wrap it) is still incredibly slow when trying to pass larger data structures (>1KB).

bytedeco · on Feb 14, 2016

On modern JVM we can use direct NIO buffers and let Java access native memory directly at full speed. There is (almost) no overhead at all anymore, so if we do it that way, it's pretty fast actually.

agibsonccc · on Feb 14, 2016

Fwiw: we do this in our blas implementation. https://github.com/deeplearning4j/libnd4j

jesuslop · on Feb 13, 2016

Bookmarked, looks useful and leaner than swig and any other thing I know of.

ww520 · on Feb 13, 2016

Looks very useful. It would help to bridge to many C++ packages. I wonder what the deployment option is. Can it bundle multiple platform libraries (e.g. Windows and Linux X86) in one package?

agibsonccc · on Feb 13, 2016

Yes. You can have a custom shell script that sets up the deployment. It's not ideal but it works in practice: http://search.maven.org/#search%7Cga%7C1%7Cjavacpp