Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Julia library for fast machine learning (turing.ml)
226 points by todsacerdoti on May 10, 2020 | hide | past | favorite | 38 comments


Turing.jl is great as is Flux.jl (which I have used more).

I retired a year ago and was looking to settle down and just use one programming language - easier on an old guy like myself, who has always been fascinated with many programming languages. I ended up picking Common Lisp for my retirement projects, mostly because of almost 4 decades experience with CL.

For a Lisp programmer, Julia feels like a very comfortable language. The interactive repl is very nice. I experimented with Flux a lot, and wrote snippets for web scraping, text processing, etc. It also worked really well for these non-numeric use cases: a very good general programming language, not just for numeric processing.

Julia has great Python interop so just in case you need a Python library, you have it available.


Looking at the examples, I still struggle with code that imports multiple libraries at the top and then uses naked function names without telling me where those functions come from.


In Julia you can `@which naked_function`. That might help.

With Julia's function overloading a function might come from multiple packages.


Even though you can do `@which naked_function` it's a fair position to have that you prefer to import modules and be explicit about where they come from. I tend to prefer explicitness like that in package code, but for data analytical scripts (like the examples here) it would be superfluous in my opinion.

It is nice that Julia leaves this style decision to the user. I personally find the constant prepending of modules to be one of the clumsiest aspects of Python for data analysis.


Or rather, a function from one package can be extended with methods by other packages.


And to add:

Which of the many method of that function is called depends on the type of all the arguments (not only the first argument, as in single dispatch languages like C++). In other words, the implementation actually executed (and thus the source package) might vary depending on the argument type. If I understand Julia correctly.


I still don't get why multiple dispatch was chosen over having seperate functions with typed arguments. It just seems to add complexity with limited benefit.


It's hard to say limited benefit when the entire Julia ecosystem evolved to so heavily rely on it, to allow for each package to work on it's own level of abstraction: For example, the Julia's standard library implements all of the basic math operators on the CPU, and libraries like Flux then define it's own methods to implement the higher level operators used in ML (such as convolutional layers and activation functions). And then someone writes those same basic math operators but instead of running on CPU they run on GPU, and for that they use a new type (CUArray). The original library knows nothing about CUArrays, it will call the same basic operators as always, but since they have a different type (received from the user) they'll dispatch to the GPU version.

This kind of interaction can grow indefinitely, for example if you use a complex number type/library it will change the basic operators to deal with both real and imaginary parts, and if you use the GPU types within it, then it will do complex math in the GPU (and ML on complex math on GPU..., without any of the libraries being aware of the other). You can see a more detailed explanation on:

https://www.youtube.com/watch?v=kc9HwsxE1OY


This video is my best attempt to explain it (not a rickroll, I swear): https://youtu.be/kc9HwsxE1OY


Nice talk, you're a really good speaker. Thanks.


There's some amazing things happening in the Julia ecosystem. See also: https://sciml.ai/


Can someone informed give some suggestions as to compare/contrast to other tools at the intersection of probabilistic programming and deep learning? What are relative strengths and weaknesses vs edward or pyro?


I've only used each of those a little.

Turing is a bit like Stan, JAGS, or BUGS, it's [closer to] a general probabilistic programming system with a Bayesian emphasis, although maybe less Bayesian in emphasis than those other DSLs. PyMC would be another comparison.

Edward is (or when I used it) coming more from a very general latent variable modeling framework, encompassing hidden variable models, and is less focused on Bayesian modeling per se, maybe more variational inferential approaches. It seems to be broadening in scope over time.

Pyro I think is further down the deep learning/NN path than Edward.

It's hard for me to ennumerate strengths/weaknesses, as they have different foci and are parts of different language ecosystems. It depends a bit on the use case. My own experience with each is such that I might use Turing, or move to something like TensorFlow; things like Edward or Pyro seem to occupy this intermediate ground that was difficult for me to utilize in the way I thought I might.

I've been excited by Turing, just to see a probabilistic programming framework like that in Julia. I think the expressiveness of Julia and it being native to that framework will be helpful.


I've been excited by Turing, just to see a probabilistic programming framework like that in Julia. I think the expressiveness of Julia and it being native to that framework will be helpful.

I hope so too. But hasn't Julia's TF/Torch equivalent, Flux, had performance problems? That was the rumor I heard anyway, I haven't had the chance to use it myself.


It currently has problems with some classes of models on the GPU, but this is just due to memory management and is going to fixed soon. Julia is natively compiled, so doesn't use a tape, tracing, or for memory management simple ref counting (in this case not as good).

This is slated to be fixed in the short term with an abstract tracing framework which eliminates memory allocations. Given Julia's type information and ability to manipulate IR from third party programs, this is more general and powerful than pytorch's tracing system. Works on a larger set of code (the whole language), doesn't require actually running the code (abstract tracing), and allows for other program transforms/ analysis like compilation to XLA, shape inference, compile time errors, source to source prob programming : https://github.com/MikeInnes/Poirot.jl, and other things.

That's in the short term and should bring flux up to SOTA for speed (it already is on CPU). In the medium term, a general framework for optimizer passes will allow for more general compile time memory management.


Note you can use this on Zygote to preallocate stuff: https://github.com/oxinabox/AutoPreallocation.jl . It doesn't support GPUs yet, mainly because the dev on it needs a GPU CI setup, but it should mostly just work (issue https://github.com/oxinabox/AutoPreallocation.jl/issues/10)


Wow... This is mind blowing! I am surprised there has not been more fanfare about this package.


Most of that is not fundamental to Julia or Flux itself. It’s the difference between a monolithic package like TF and source-to-source AD in Julia. The former allows the designers to use their own data structures and external libraries to do optimizations. Source-to-source relies on the underlying IR used by Julia, making optimizations challenging without some compiler assistance. But all of that is in the pipeline with stopgap solutions on the way.

As with most things in Julia, the code developers don’t just want to hack changes that work, but make changes that are flexible, extensible, and can solve many problems at once. So, Flux isn’t ready for prime time yet, but it is definitely worth keeping your eye on it.


Turing.jl is in an interesting spot because it is essentially a DSL-free probabilistic programming language. While it technically has a DSL of sorts given by the `@model` macro, anything that is AD-compatible can be used in this macro and since Julia's AD tools work on things written in the Julia language, this means that you can just throw code from other Julia packages into Turing and just expect AD-compatible things to work with Hamiltonian Monte Carlo and all of that. So things like DifferentialEquations.jl ODEs/SDEs/DAEs/DDEs/etc. work quite well with this, along with other "weird things for a probabilistic programming language to support" like nonlinear solving (via NLsolve.jl) or optimization (via Optim.jl, and I mean doing Bayesian inference where a value is defined as the result of an optimization). If you are using derivative-free inference methods, like particle sampling methods or variants of Metropolis-Hastings, then you can throw pretty much any existing Julia you had as a nonlinear function and do inference around it.

So while it's in some sense similar to PyMC3 or Stan, there's a huge difference in the effective functionality that you get by supporting a language-wide infrastructure vs the more traditional method of one-by-one adding features and documenting them. So while PyMC3 ran a Google Summer of Code to get some ODE support (https://docs.pymc.io/notebooks/ODE_API_introduction.html) and Stan has 2 built-in methods you're allowed to use (https://mc-stan.org/docs/2_19/stan-users-guide/ode-solver-ch...), with Julia you get all of DifferentialEquations.jl just because it exists (https://docs.sciml.ai/latest/). This means that Turing.jl doesn't document or doesn't have to document most of its features, but they exist due to composibility.

That's quite different from a "top down" approach to library support. This explains why Turing has been able to develop so fast as well, since it's developer community isn't just "the people who work on Turing", but it's pretty much the whole ecosystem of Julia. Its distributions are defined by Distributions.jl (https://github.com/JuliaStats/Distributions.jl), its parallelism is given by Julia's base parallelism work + everything around it like CuArrays.jl and KernelAbstractions.jl (https://github.com/JuliaGPU/KernelAbstractions.jl), derivatives come from 4 libraries, ODEs from etc. the list keeps going.

So bringing it back to deep learning, Turing currently has 4 modes for automatic differentiation (https://turing.ml/dev/docs/using-turing/autodiff), and thus supports any library that's compatible with those. It turns out that Flux.jl is compatible with them, so therefore Turing.jl can do Bayesian deep learning. In that sense it's like Edward or Pyro, but supporting "anything that AD's with Julia AD packages" (which soon will allow multi-AD overloads via ChainRules.jl) instead of "anything on TensorFlow graphs" or "anything compatible with PyTorch".

As for performance and robustness, I mentioned in a SciML ecosystem release today that our benchmarks pretty clearly show Turing.jl as being more robust than Stan while achieving about a 3x-5x speedup in ODE parameter estimation (https://sciml.ai/2020/05/09/ModelDiscovery.html). However, that's utilizing the fact that Turing.jl's composibility with packages gives it top notch support (I want to work with Stan developers so we can use our differential equation library with their samplers to better isolate differences and hopefully improve both PPLs, but for now we have what we have). If you isolate it down to just "Turing.jl itself", it has wins and losses against Stan (https://github.com/TuringLang/Turing.jl/wiki). That said, there's some benchmarks which indicate using the ReverseDiff AD backend will give about 2 orders of magnitude performance increases in many situations (https://github.com/TuringLang/Turing.jl/issues/1140, note that ThArrays is benchmarking PyTorch AD here) which would then probably tip the scales in Turing's favor. As for benchmarking against Pyro or Edward, it would probably just come down to benchmarking the AD implementations.


Hi Chris, as a heads up, Stan actually has had three built-in methods for a while now. There is a non-stiff Adams-Moulton solver introduced in 2018. It unfortunately was only just exposed in the Stan 2.23 documentation: https://mc-stan.org/docs/2_23/stan-users-guide/ode-solver-ch.... Certainly, the devs have been talking about adding more solvers for a while, including SDE and DDE solvers, and your DifferentialEquations.jl ecosystem is an excellent model; it is an area that we know Stan has been lacking in. I think Steve Gronder will be trying to work with you regarding benchmarking.


Awesome. If we can get FFI with Stan I'd like to connect DifferentialEquations.jl to it and poke at it with various problems to see how well it does on a few things. We can provide custom gradients if there's an interface for it, but I couldn't figure out how to do it without modifying the Stan source itself.


In order to connect Stan with DifferentialEquations.jl the steps would be:

1. Create "diffeqcpp", a C++ interface to DifferentialEquations.jl (that would be similar to diffeqpy, diffeqr) possibly using CxxWrap.jl

2. Make it possible to evaluate vector-Jacobian products (VJP) with "diffeqcpp". Probably that would require ODE RHS to be coded as a string of Julia code, to make Julia AD libraries compatible with it.

At this point, it should be possible to call Julia solvers from C++ and evaluate the derivatives.

In Stan, there is stan::math::adj_jac_apply that makes it possible to define custom functions with custom VJP without having to deal with Stan autodiff types, it works for example with Eigen::Matrix<double>. https://discourse.mc-stan.org/t/adj-jac-apply/5163

3. Make a class (let's call it JuliaODESolver) that implements two methods:

    operator() // calls Julia solver for the given input 

    multiply_adjoint_jacobian() // evaluates VJP for the given vector
4. In .stan file add a custom function in "functions {}" block, and write a header file that implements that custom function. That would probably be one line

    return stan::math::adj_jac_apply<JuliaODESolver>(ode_solver_inputs);
More info on using external C++ code is in Section 4.5 CmdStan Manual.

5. Modify cmdstan/main.cpp to initialize and finalize Julia context to be able to call Julia functions. This is probably the only place where Stan source itself needs to be modified.

I don't know what would be needed to make forward mode, and higher-order derivatives to work.

I think it would be much better for a fair benchmarking if there was a convenient and documented interface to Stan algorithms to use with user-provided log-density function, similar to DynamicHMC.jl and AdvancedHMC.jl libraries. It would be then easy to call it from Julia/Python/R/C++ or anything else.


Thank you for this very helpful comment.


How does this compare to gen? https://www.gen.dev/


Turing has more things that work out of the box, so if you do not have complex requirements its a good first step. Gen allows for composing models using its generative function interface, you can specify models in different ways. You can also have fine grained control over inference, rather than a few preset methods. Gen has also worse error messages and docs.



The documentation / project page is very well-done, something unfortunately rare in the Julia ecosystem.


Thank you. In addition to the docs on turning.ml, you can find a few examples for the book “Statistical Rethinking” implemented using Turing in https://github.com/StatisticalRethinkingJulia/TuringModels.j....

We are currently also looking for students to help us further improve the documentation and tutorials in the course of the Google summer of Docs. Some possible projects are listed here: https://julialang.org/jsoc/gsod/projects/#turing_probabilist...

Please reach out to us if you are interested.


To add to that, even if Julia had excellent documentation literally everywhere and on every library, I wish there were better stack trace and meaningful error messages. Even if Julia performed 10x worse, this overlooked aspect of Julia would make up for it. It is rather unbelievable how much time I need to spend to figure out what's wrong with a particular piece of Julia code.

Founders of Julia - please focus on error messages. Some cool things from Rust: https://doc.rust-lang.org/edition-guide/rust-2018/the-compil...


As walnuss says below, these things are already being worked on. If you tried Julia a few months or couple of years ago, you'll find that stacktraces have already got quite a lot better.

One of the things that the Julia community could greatly benefit from is more compiler contributors. Languages like Rust naturally attract compiler folks, and those like Go have the backing of Google.

Julia has more complex compiler due to our dynamic type system, which the users absolutely love. But it also puts a lot of strain on the compiler team, which is quite small. So if any folks with experience in compiler technology on HN are looking for interesting projects to contribute to, please do look into Julia.

We've done better over the years increasing our bus number on the compiler codebase overall. More contributions in all areas of the toolchain would be great to have!


As bystander I love the work you guys are doing, in a way it feels like Dylan was reborn and found a place in scientific computing.


As a user of Julia for 2-3 years, is there any guidance for someone looking to help with the compiler?


It's being worked on. I am rather excited for the upcoming work that makes the parser replaceable and allows us to actually give good syntax errors! There is some discussion about making error printing more configurable so that one can skip stack-frames that are unlikely to be the cause (albeit that's a double-edged sword).


I think this is quite a hard problem to solve. A Python wrapper of some some monolithic C library can check & reject any input which doesn't meet the spec, and tell you why. But a Julia library typically wants to be generic, you should (for instance) be able to pass it weird numbers that contain units/error bars/gradient info, that the library designer knew nothing about, and have these propagate through. When this works it's great, but when it fails, the failure point tends to be 10 layers down, in internal workings you've never heard of. That's part of why error messages are inscrutable.



Lack of stack straces and poor error messages hasn't hindered the adoption of most compiled languages.

Naturally it is something to improve, but I bet that are still bigger fish to fry.


Different users have different problems. I’ll take the recent major advances in, say, start up time and multi threading instead of improvements to the error messages.


I don't think that there's a silver bullet for this. A compiler can only tell you where it caught an error: telling you where the error occurred is equivalent to the hard AI problem.

On the other hand, we the users can do a lot to help. After you've worked through those 30 stack frames to find the one that went wrong, you could send a PR to detect that. In many cases, I suspect it would help to just catch the error then re-throw it from the method that went wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: