Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a great news ! We are exclusively using scala at work for back end and I wonder if it could be interesting to switch new projects to scala native.

Did you test scala native against well known and massive open source scala project ? Did the performance improved or regress ? Did you wrote a brand new scala compiler for native code ?



Never touch a running system ;) Scala on JVM is much more tested than the new shiny thing. Also don't expect improved performance... many people think that the JVM is bloated and makes programs slower (this is mostly not true). The downsides of the JVM are more memory consumption/footprint (when you have e.g. small servers or micro instances) and the cold startup time of the JVM itself (which is not relevant on a server in comparison to desktop Java apps). Would be interested to hear if any backend Scala projects like e.g. Play work on Scala Native.


> the cold startup time of the JVM itself (which is not relevant on a server in comparison to desktop Java apps).

I disagree somewhat with this.

We found that when we started writing microservices in languages that are not java, the short startup time changed how we did some error handling.

For errors where we say lose connection to the database, or rabbitmq, we much rather have the nodejs-process die and restart, than try to construct reconnect logic.

The problem with reconnect-logic is that it is code that (may) be tested very rarely. This in turn means it's easy to get strange long term problems there like very slow memory leak due to some listener being added to a connection object once the connection is initiated.

We did a 180 on reconnect-logic in our nodejs-processes and let the exceptions just bubble unhandled and take the entire vm down. With automatic restart script, the process will be back in seconds anyway, and with docker having built in back-off timers for auto restart, we don't necessarily overload the shared resources.


> For errors where we say lose connection to the database, or rabbitmq, we much rather have the nodejs-process die and restart, than try to construct reconnect logic.

This sounds like a very Erlang-ish way to handle the problem. Another advantage is that if the server/process is in some weird state that's causing problems, killing and restarting it lets you clear out the broken state, and get back into the state that it's most likely been tested under.


Yes. We're almost now risking it going the other way, that some bad programming goes unchecked for a long time, because overall, the process sort of does what it should. Even if it restarts like 10 times a day.


Well, you do still want good reporting, so you know when failures are happening and can capture a stack trace.


I guess I don't see the difference here if your VM startup time is 3 seconds or 15-30 seconds. If that's the difference between the site remaining stable and the whole thing collapsing then it seems like you're setting yourself up for a big outage one day when the nodejs process isn't able to come back in three seconds for whatever reason.


I think it depends a bit on class of errors. Certainly not everything is suitable for this treatment.

Lost connectivity to RabbitMQ or Elasticsearch would mean our site is dead anyhow (you can't do anything). So either of those errors should arguably result in some static 500 pardon-our-appearance page.

But say someone messes up the network connection or we get a brief problem.

Why wouldn't the nodejs process start quickly?


The most effective way to handle these kind of errors in Java unfortunately requires understanding class loading, thread contexts, wrapping connection primitives in the right kind of references, and then making sure that all resource deallocation/closing always use the same codepath. Even though you really only need to implemnt it once, it is both somewhat tricky and technically challenging.

It's a pity that so few Java projects have tried to use these mechanisms without building them as part of massive frameworks, sometimes apparently even without understanding what they have built.


Yes, I did spend large parts of my java developer looking at class loaders and class loading delegations in servlet containers etc.

I think it's a bit too hard to get it right.

Like, suddenly some third party library starts pulling in log4j and your whole logging setup goes wrong in subtle yet very bad ways.

Or you screwed up with that one reference to a ResultSet and even though it is closed, that reference keeps an entire class tree of Connection, PreparedStatement etc alive.


Isn't this also solved by just load balancing so that the customer ends up reconnected to a healthy node while the downed node is replaced?

We run our Scala apps on Aurora/Mesos behind a load balancer (hundreds of instances for just one app). If there's an issue that can't be handled within the app and error rates breach a given threshold, Aurora just kills the instance and creates a new one on another host.


A JVM boots in about 100 milliseconds. A difference of 100 milliseconds made the difference in how you do error handling?


On what box with what kind of disk for how big of app? You aren't getting 100ms starts on a 199 mb fatjar on Amazon EBS.


At the moment, there isn't direct support for multithreading [1], so I'm guessing it would be very difficult to run any of the common web servers or computing frameworks natively. It may be possible for libraries that have pluggable concurrency, for example by creating an `ExecutionContext` that wraps OS threads, but that's waaay beyond my pay grade.

[1] http://www.scala-native.org/en/latest/user/lang.html#lang


Cheers - this is the #1 item on my excitement checklist.


Long time Java lover here. I agree with all your points, but in the context of Java at least (does Scala support this?) there is no simple static binary that can be built and released, which includes the JVM. I think 1.9 will have this option, but this is something I didn't realize I missed until I started work with Rust and Go. It makes deployment so much simpler.


In the age of containers it's really not that much harder to build and deploy a JVM app.

Edit: thanks for the downvotes but you could at least tell me what's so crazy about my statement.


One app, sure.

At work I'm running 17 different containers- many of which require their own JVM. (That's 3 different JRuby apps, zookeeper, kafka, and ElasticSearch.)

Those JVMs get heavy when you're shipping container images compared to small Go or Rust binaries.


To my mind the JVM is where containers make the least sense. If you build an executable jar you can run with "java -jar ..." then that seems just as simple as "docker run ..." and gets you the single-file deployment, and you can control memory allocation via flags if you need to. You don't get virtual networking but IME that doesn't add value in the first place.


There are still some licensing issues. For instance, Atlassian has an official docker container to evaluate Confluence, but they don't support it in production since it uses OpenJDK and Confluence is still somewhat broken on OpenJDK.

Rather than fix Confluence to work on OpenJDK (I don't want to imagine what type of reflection garbage they've got going on down there that breaks so bad on OpenJDK), their instructions tell you how to make your own Dockerfile using the official Oracle runtime.

Actually, in that situation, if it won't run on OracleJDK it's probably not going to work via a native compiler either.


No downvote from me. But a explanation why e.g. compiled binaries are better. I had a hard time to get a normal non fancy Scala Play project running on a 512MB DigitalOcean instance. Mostly because it needs a lot more ram for building. I solved it with using a bigger swap partition. With single binary precompiled programs this problem is more a developer machine problem than a infrastructure problem. So I think the deployment step itself (and not looking at anything else) is easier with a small single binaries.


You shouldn't build it on the deployment server. You build a jar and upload/download that to/from the place you want it to run, just as you'd do with an executable. A jar is "not binary" but what practical difference does that make?


yes, you're right, thanks for pointing that out.


Does it really need to be a binary? Build executable jars (use the maven shade plugin), run them with java -jar foo.jar, that's about as simple as it gets.


I have never successfully deployed a Java application with defaults on the GC, etc.

I'd love it if I could compile those options into a binary.


Yeah you can't do that, and I don't necessarily agree with that design decision. But for the sake of the comparison it's worth saying that these "simple" compile-to-binary languages simply don't let you set those parameters at all - it's ridiculous to argue that Go (say) is better than Java because something that's impossible in Go requires fiddling with parameters in Java.


I didn't actually say Go is better than Java. I said that binaries were something I realized I missed because of those languages. That is, it's something I appreciate about Go and Rust.


But what's the advantage of a binary over a (shaded) jar? "java -jar myapp.jar" is a little more typing than "myapp", but only a little (and you can avoid that by prepending a launch script if you want); having the JVM installed on all your servers is a one-time cost.


See my above comment about GC and other runtime options. I always need a script to specify all the options to the JVM. It is never just as easy as a single jar with no options. That makes it better, no doubt, but it still sucks.


I don't understand how you need those other options in Java and avoid needing them with a binary? What's the difference that means you can get by with not passing any options in go or what have you?

(I've used "java -jar myapp.jar" in production and it's been fine; the Java mainstream may favour using lots of -Dblah but it's entirely possible to replace that with code)


Maybe some apps do not have classpath. Here is what I see of running kafka instance on one of my server. And it does not looks like as simple as it gets.

java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/opt/kafka_2.11-0.10.0.0/bin/../logs/zookeeper-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/opt/kafka_2.11-0.10.0.0/bin/../logs -Dlog4j.configuration=file:bin/../config/log4j.properties -cp :/opt/kafka_2.11-0.10.0.0/bin/../libs/aopalliance-repackaged-2.4.0-b34.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/argparse4j-0.5.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/connect-api-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/connect-file-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/connect-json-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/connect-runtime-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/guava-18.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/hk2-api-2.4.0-b34.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/hk2-locator-2.4.0-b34.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/hk2-utils-2.4.0-b34.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jackson-annotations-2.6.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jackson-core-2.6.3.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jackson-databind-2.6.3.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jackson-jaxrs-base-2.6.3.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jackson-jaxrs-json-provider-2.6.3.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jackson-module-jaxb-annotations-2.6.3.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/javassist-3.18.2-GA.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/javax.inject-1.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/javax.inject-2.4.0-b34.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/javax.ws.rs-api-2.0.1.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jersey-client-2.22.2.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jersey-common-2.22.2.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jersey-container-servlet-2.22.2.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jersey-container-servlet-core-2.22.2.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jersey-guava-2.22.2.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jersey-media-jaxb-2.22.2.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jersey-server-2.22.2.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jetty-continuation-9.2.15.v20160210.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jetty-http-9.2.15.v20160210.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jetty-io-9.2.15.v20160210.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jetty-security-9.2.15.v20160210.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jetty-server-9.2.15.v20160210.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jetty-servlet-9.2.15.v20160210.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jetty-servlets-9.2.15.v20160210.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jetty-util-9.2.15.v20160210.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/jopt-simple-4.9.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/kafka_2.11-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/kafka_2.11-0.10.0.0-sources.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/kafka_2.11-0.10.0.0-test-sources.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/kafka-clients-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/kafka-log4j-appender-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/kafka-streams-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/kafka-streams-examples-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/kafka-tools-0.10.0.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/log4j-1.2.17.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/lz4-1.3.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/reflections-0.9.10.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/rocksdbjni-4.4.1.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/scala-library-2.11.8.jar:/opt/kafka_2.11-0.10.0.0/bin/../libs/scala-parser-combinators_2.11-1.0.4.jar:/opt/kafka_2.11-0.10.0.0/bin/../lib


If you use the maven shade plugin (or similar) you can replace the whole "-cp ..." stanza with a "-jar myfile.jar". The "-D" arguments can be set in code instead (though I'd ask why you are allowing remote management without authentication and without SSL?).

The rest of the arguments are about GC tuning and logging. How would you do those things in a language that gives you a "simple" static binary? Either you can't at all, or they'd require an equally complex series of arguments.


It's a few lines of code to get sbt to combine everything into a single jar. Do other languages not requiring including libraries?


Statically compiled languages like Go do not.


Well, there are two separate questions here:

1. What should the default be? Java build systems default to building dynamically linked, though it's a few lines to change. IMO dynamic is a better default for large projects, as you usually have more library modules than executable modules. On the other hand a large project is likely to already involve a fair bit of build config, so maybe the defaults should be optimized for small projects.

2. Whether you allow dynamic at all. To my mind it's always worth having the option, and I think Go will come to regret not having it if and when it ever gets used for large projects.


There are plenty of large projects like kubernetes/docker/rkt/influxdb/tidb/cockroachdb and so on. Go is providing quite large memory efficiency and sub-millisec GC as compared to Java.

As of Go 1.8 it also provide plugin support though I am not sure if they are any where near Java in term of dynamic libraries loading support.


Only works on Linux as of now.


Sure there is, all commercial JVMs support AOT compilation to native code.

That most don't want to pay for them is another matter.


> the cold startup time of the JVM itself (which is not relevant on a server in comparison to desktop Java apps).

With microservices run as containers that are started on demand it matters; with other architectural choices it may matter less.


I'm interested in building CLIs with this.


I think for Server-Application Scala on the JVM will probably beat Scala-Native. The benefits of Scala-Native over Scala JVM are:

   - faster startup time
   - (drastically) lower memory footprint
   - fine hand-tuning of you application
All these things are not super important in server-applications. For example Java trades memory for throughput (higher memory footprint, but also higher throughput. These usually go hand in hand.).


For me most important benefit is running without pre-installed VM. Not particularly important for backend, but huge win for user-space tools.


You could always bundle the jre, and with Java 9 one can even make use of the newly introduced linker to create a customized image just with the relevant classes.


True, and if you wouldn't have to worry about Oracle licensing either (although you can avoid that currently with OpenJDK as well)


Servers very frequently benefit from lower memory footprints, as it can also dramatically improve performance by improving cache efficiency.


The large memory footprint of the JVM is memory for classes, profiles, things like that. Those are used to create optimised code and to recover when optimisations were too optimistic. When your program is optimised and running in steady state, this memory isn't actively used and so doesn't contend with your application memory and so has no impact on cache efficiency.


This sounds like a plausible explanation, but is this verified/verifiable? Are there memory profilers that can show me the relative sizes of the young/old/permanent generation segments of the GC?

I'm always blown away at the memory usage of JVM apps. Part of it is the fact that java has encouraged insanity-inducing inheritance hierarchies...but also it is incredibly hard to do dead code optimization on for such a static (type and compilation model) language (I blame dymanic classloading, but that's more of a guess than anything). Maybe what you're saying is the reason we don't see noticable GC pauses until you start seeing large amounts of data...but it is still a huge pain for low memory environments like phones, embedded devices, IoT, etc. And while memory usage is always gonna be higher on a GC'd language, the JVM still consumes vastly more memory than other languages like OCaml, D, Go, etc.


Yes, it's called "perm gen cache," or something like that, on any standard JVM profile. This roughly represents the memory used by the type system. It can get pretty high if you are doing something like auto-generating types (GUI, build systems, etc)


perm gen disappeared on java 8 and i think the high memory demand for perm-gen was one of the reasons.


In general (not for server apps), two major benefits of Scala Native are:

  - Predictable latency if desired (optional GC)
  - Very low call overhead for C ABI
As to your point about memory use, Java trades memory for convenience, not performance. GC requires substantially more memory for similar performance. I read an IBM blog (which I can't find at the moment) within the last week which showed a Swift web service running slightly faster than Java, but using only half the memory.

The following comparison is also interesting, with a JSON serialization example in Swift outpacing Spring/Java by a factor of ten... This is also running on Linux instead of macOS.

https://medium.com/@qutheory/server-side-swift-vs-the-other-...


The medium post is really a stupid benchmark. Check this for a real JSON serialization benchmark - https://www.techempower.com/benchmarks/#section=data-r13&hw=...


To be fair, in that list, the spring entry didn't exactly run circles around the competition either. It's somewhere between the better PHP contenders and even behind grails (which can be pretty accurately described as spring with layer of slowness added on top). Really looks like there is something unfortunate going on with the idiomatic way to implement those examples on spring.


> Predictable latency if desired (optional GC)

Does Scala Native support not using a GC? It seems like it would be difficult to get Scala working without a GC.


It supports direct allocation via both the heap and the stack.

  type Vec = CStruct3[Double, Double, Double]

  val vec = stackalloc[Vec] // allocate c struct on stack
  !vec._1 = 10.0            // initialize fields
  !vec._2 = 20.0
  !vec._3 = 30.0
  length(vec)               // pass by reference
...and...

  @extern object stdlib {
    def malloc(size: CSize): Ptr[Byte] = extern
  }

  val ptr = stdlib.malloc(32)
http://www.scala-native.org/en/latest/

Otherwise it currently uses the Boehm GC.

One big area that needs more work:

  "Scala Native doesn’t yet provide libraries for parallel
   multi-threaded programming and assumes single-threaded    
   execution by default."


But can be important if you only have a 512MB DigitalOcean instance or using small Linux containers/Docker.


I used the JVM on the 512MB DO instances and containers and they run fine. I think for containers there are other issues (most likely you are going in a micro service-direction where latency is eventually going to be important, so picking another JVM GC-algorithm might be suitable). There may be applications for which the 512 instances and JVM are not suitable, but you can most likely just upgrade the instance.


The JVM itself doesn't add a huge memory footprint. I remember Charles Nutter of JRuby fame calling it the "20-30 MB memory tax" (see http://blog.headius.com/2008/11/noise-cancelling.html).

A lot of the extra memory usage of Java apps comes from sloppy programming and from depending on lots of heavyweight libraries and frameworks.


If you include idiomatic Java programing as part of sloppy programming I also agree with that.

https://www.cs.virginia.edu/kim/publicity/pldi09tutorials/me...


Many things pointed out in this article apply to just about every managed language runtime. Implement a TreeSet in any language and you'll see the same overhead from object headers, memory alignment, etc. Java has some oddities that cause it to waste extra memory, but off the top of my head the only I can think of is 16-bit character Strings. Java 9 is supposed to help with that by allowing Strings to internally store utf8 characters.

I do like the slide though showing that people tend to assemble abstractions together and completely lose sight of the performance costs of what they are doing. There's also the fallacy commonly held by many that because someone took the time to write a framework or library, they must have also taken the time to ensure it's optimized well.


Go uses quite a bit less memory than Java.

http://benchmarksgame.alioth.debian.org/u64q/go.html


As per the Specjbb benchmark, JDK9 compact strings optimization itself provides,

  * 21% memory footprint reduction
  * 27% less GC
  * 5% throughput improvments
https://www.infoq.com/presentations/java-se-9-cloud - check this presentation for more details.


Impressive numbers.


Due to value types support, which are part of Java 10's roadmap.


I'd imagine it will be at least 5 years away when most popular Java software use this feature and java users see the effect of this.


Don't forget the ability to distribute binaries. This could be really nice for in-house tools.


I'm not involved with this project in any way, but I would expect performance to be overall worse with Scala Native. The advantages of Scala native are likely:

1) Much faster startup times

2) smaller memory footprint for small programs.

3) Potential for easier installation since no dependency on the JDK (assuming binaries are statically linked)

So basically you could use scala native to cover some cases that are better covered by golang or rust right now. For large and long-running server-side processes, the JVM is still king.


Yeah go for it. Nothing could go wrong: https://github.com/scala-native/scala-native/issues/543




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: