Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks especially for the "Not All Sunshine and Rainbows" section. It is all too easy to write about the positive things and leave the negative parts out.

Resource leaks in Haskell perhaps a bit trickier to track than in other languages and I would appreciate hearing more about the issue you were experiencing and how you solved the problem. In many blog posts there have been warnings against long running Haskell processes but you guys seem to have fairly successful with it.

Also, the problems you were experiencing with Cabal might be fixed in newer versions with the sandboxes feature which is now built-in with later versions of Cabal.



We deal with Haskell resource leaks the same way you would in C++ or Java.

We have production monitors on every host that show basic metrics like memory, disk, and CPU utilization. Atop that, we added a tracker for the number of suspended Haskell threads. (that is, threads which are not blocked on I/O, but are also not running)

We found that the machines are usually able to handle requests as soon as they come in, so if the number of Haskell threads goes above 0 for any length of time, the machine is about an hour away from melting down.

We can restart the process without losing any connections, so this leaves us a very comfortable margin of error.

Once we know we have a problem, it's usually pretty simple to run the heap profiler on the process and look at recent commits. We continuously deploy, so there's only about a 10 minute delay before a particular commit is running in front of customers. This makes tracking regressions down really fast.

Even in cases where we can't figure out why a bit of code is leaking, we can almost always identify it and revert it until we understand what's going on.


> We can restart the process without losing any connections

Would you mind expanding on this a bit? I'm not too familiar with Haskell, but I am familiar with various was of blocking new connections while allowing existing connections to complete, either at the load-balancer level or built-in each individual process.

What Haskell stack are you using, and how are graceful restarts accomplished?

Thanks.


One of my coworkers wrote a really cool bit of software to do this. I want him to open source it.

Basically, you can share a single socket amongst many servers. The OS ensures that just one process accepts each connection.

You can therefore have a manager process that owns the socket and passes it on to application processes.

To update, start new processes, then politely tell the old ones to go away.


One really cool thing in Linux is that you can actually pass file descriptors between processes over unix domain sockets.


Windows has supported this for ~14 years too.


Good to know. Does it work for everything that's an fd in Linux? I know you've got to treat sockets and files differently in some cases (or at least did once)...


It works for most kernel handles, sockets might be a little more normal starting with Win7 but I stopped doing Windows development around then.

Here are the official docs: http://msdn.microsoft.com/en-us/library/windows/desktop/ms72...


Looks like there's a separate function for sockets.

Still, cool stuff there too.


einhorn [1] implements this model and is pretty effective. Used in production at Stripe and other places. (It's written in Ruby, but can run application processes in any language.)

[1] https://github.com/stripe/einhorn


Basically, catch SIGINT, then stop listening to a socket/port. Finish all current requests and exit. The "watcher" parent process will restart the process with the new executable. Repeat for all other processes listening to the socket/port.


I can't answer for grandparent, but you should check out https://github.com/notogawa/graceful


Except in Haskell you can build ekg right into your server. http://ocharles.org.uk/blog/posts/2012-12-11-24-day-of-hacka...


"we added a tracker for the number of suspended Haskell threads" - would you mind sharing how you did that? I couldn't see any obvious GHC APIs for it.


It looks like you're right. I misspoke.

We track total threads, working or not. It works great as an indicator because it tends to stay below the number of CPU cores on the server.


I take it you mean OS-level threads then?


We track Haskell threads.

edit: Found the code. :)

We rolled our own implementation. Our WAI application action increments a counter and decrements it again whenever an HTTP request is received and completed.

It doesn't track threads created as a part of HTTP request handling, but we don't allow those actions to forkIO anyway. There hasn't been any demand for it.


Ah, right, that makes sense - thanks.


... so, how? Isn't that what he was asking?


I don't believe sandboxes fix the particular problem the OP was describing, which is that of reproducible builds across multiple environments (different team members / CI / prod).

A cabal sandbox means once you've got your dependencies to resolve and your app to build, it'll continue to build and use the same versions of its dependencies, when building from that sandbox (which pretty much means "when building in that working directory"). But it gives you no guarantee that if your dev build got version 0.1.2 of a transitive dependency, then your CI server will also get version 0.1.2, and not 0.1.3.

If it turns out that your app works with 0.1.2 but not with 0.1.3, then your dev machine will reproducibly produce working builds, while your CI server will reproducibly produce broken builds.

What's really needed is an analogue to the Gemfile.lock used by Ruby or npm-shrinkwrap.json in the Node world, which is checked into version control, and freezes the exact versions of all transitive dependencies until explicitly updated. I think there's a "cabal freeze" command in development, but I'm not sure what the status is.


You can easily lock to a specific dependency version in the cabal file if that is your desire.


Sure - if you can reliably identify the exact required versions of all of your transitive dependencies. That's infeasible for nontrivial applications. (And even that's only if the set of exact versions you find manage to not have conflicting requirements with each other.)

The reason Gemfile.lock works is because it lets you achieve that the same way you create working code - figure out what works in dev, using a combination of skill and trial and error, then lock it down in version control and deploy exactly that to CI/prod/other devs.

People have written shell scripts to scan your sandbox for installed package versions and update your cabal file to require those versions, but it's an inherently approximate process - e.g. if you upgraded a transitive dependency but still have the previous version in the sandbox, the shell script has to guess which one you want, because there's no explicit relationship between your code and a particular version.

There's a more fundamental problem with that approach - it ignores the difference between "my app semantically requires package X at version y" and "I have tested my app with package X at version y". The cabal file expresses the former - which is why it doesn't include transitive dependencies, and why it's more idiomatic to specify broad version ranges than exact version constraints. "cabal freeze", if it existed, would express the latter. Reliable engineering requires both.


We manage all deps in our cabal file, and have scripts to make sure that what is in package database matches the cabal file (this way we can upgrade versions without manually unregistering and reinstalling).

Like you said, there is nothing seamless that is part of cabal ... yet. I would like improve our workflow and integrate it into cabal.


One Haskell resource leak I've encountered a couple of times has to do with opening large numbers of files combined with non-strict semantics. By default Haskell will open IO handles but not consume them until the contents are needed, and thus, not close them. To read the contents of many files in a directory, the result is opening thousands of concurrent file handles and exhausting the OS's IO handle pool. The solution is to add strictness annotations to force evaluation and relinquish the handles, which isn't fun and isn't pretty.


This problem is being addressed by a number of packages like conduit, pipes, and (at a lower level) io-streams. These are second-generation solutions to the problem that was pioneered by the iteratee and enumerator packages.


Seconding. I've used conduit before, and it was a delight to use something so carefully designed. The blog posts about conduit are in themselves an insight into how to think with Haskell.

http://www.yesodweb.com/blog/2013/10/core-flaw-pipes-conduit http://www.yesodweb.com/blog/2013/10/simpler-conduit-core


The library in question was using unsafePerformIO to open a file handle. It was just a bug.


There's apparently a particular instance you're referring to?

But the general issue can be encountered with lazy IO without any use of unsafePerformIO. There has been a lot of discussion about this around the various enumerator-like libraries - in particular, Snoyman has many posts about ensuring timely release of resources.


As for resource leaks, the particular example in the post was a bit unfortunate. The problem in general have a solution though. You can either use functions that limit a resource inside a limited scope, and then have it clean up automatically when done (like 'withFile'). And if you want to do more complicated things there are the resourcet and pipes-safe libraries.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: