*If you are concerned about granting access to the host...* Everyone should be. ...

vidarh · on June 27, 2014

> Everyone should be.

And in that case running sshd in the Docker container is not a good idea. There has not been sufficient work in hardening Docker to ensure that there are no way of breaking out of Docker containers.

> I consider that an ugly hack that should be avoided at all costs. Why? It's not viable without up-front automation investment,

There are multiple automation tools available that can handle KVM just fine, including OpenStack. And if you're going to be deploying enough Docker containers for you to be concerned about KVM automation, then you need to invest in automation anyway, for Docker.

But what is your alternative means of sandboxing the apps if you are concerned about users gaining access to the host? Outside of other VM solutions like Xen etc.? OpenVz? No matter which sandboxing method you choose, you're either trading off security or picking complexity + additional latency and inefficiency.

> Process monitor? What are you talking about? At least with LXC, each container has PID 1 (master application process, init system, whatever) and can be stopped/started easily with lxc-stop -n nameofcontainer.

Yes, and if you want to stuff an sshd inside an lxc container or docker container alongside your application process, that means you will typically need use init, daemontools, mon or a similar tool to be the pid 1 in the container responsible for spawning the sshd and app, instead of just letting the application itself be the "local" pid 1. Which means that you don't typically end up just adding sshd, but dragging in additional dependencies as well.

> As for sshd, my point was that it can make sense because it's secure, remote-accessible, proven

That doesn't change if it's run on the host, and you force "nsenter". But it also does not protect you against the potential for escaping the containers. If the "secure" part here matters for you, you should be considering a full VM (optionally with Docker inside, but not just Docker) until Docker has substantially more testing behind it.

> That aren't remotely accessible, without breaking the abstraction and creating some shared access scenario on the host.

First of all, restricting remote access is a good thing. Part of the point is that with a properly developed container image coupled with the proper tools, you should not need to access individual containers remotely. Opening up for remote access is not just a security problem, but a massive configuration management hassle.

The number of problems that gets created, or not properly fixed, because people get sloppy when they have direct access to poke around production servers / vms / containers is one of the biggest ops headaches I've had the misfortune of dealing with over the last 20 years. Yes, you may occasionally need to poke around to trouble-shoot, and so having some means of gaining access is still necessary, but as the article points out: You still do.

(And again, if you have problems with granting access to the host, you should not trust Docker without containing it in a VM)

So in other words: They are just as remotely accessible as the same apps would be outside of the containers. And if your proposed alternative is to replace this:

     - app1
     - app2
     - app3
     - sshd + a single binary that'll soon be in pretty much all distro's.

with this:

     - init1
          - app1
          - sshd2
     - init2
          - app2
          - sshd2
     - init3
          - app3
          - sshd3
     - sshd

With either extra bind-mounts or replicating config files/keys for all the sshd's - neither a "free" solution in terms of complexity.

... then we don't have at all the same view of what complexity is.

contingencies · on June 27, 2014

If you need remote access, given sshd-in-container versus sshd-in-host as an access path, the former is clearly more secure by design, even if the containment eventually fails.

what is your alternative means of sandboxing the apps

Well, I don't have a finished solution but already I could write a fairly hefty book on evaluated approaches here ... basically combining the normal kernel tools (aggressive capabilities restrictions, connectivity restrictions, read-only root, device restrictions, mount restrictions (no /sys for example), subsystems-specific resource limits) with formal release process (testing, versioning, multi-party signoff), additional layers of protection (host-level firewalling, VLAN segregation, network infrastructure firewalling, IO bandwidth limitation, diskless nodes, unique filesystem per guest, unique UIDs/GIDs per guest, aggressive service responsiveness monitoring with STONITH and failover, mandating the use of hardened toolchains), kernel hardening and use of security toolkit features (syscall whitelist policy development automation via learning modes + test suite execution, etc.)

Fail-safe? No. Better than most? Probably. Security is a process, after all...

That doesn't change if it's run on the host

It does, because you're now contentious for host resource, and you have to have the comprehension of an abstract concept of hosts and guests and their identities living remotely, ie. normal tools - which assume node-per-address - won't work out of the box.

neither a "free" solution in terms of complexity

You have highlighted a tiny difference in the process space, which is basically free. But in doing so, you have ignored the other aspects. A single read-only bind mount per guest is very cheap in complexity terms.