By self-healing giving the attacker a second chance, I mean that it allows an unreliable attack to succeed. Consider defeating ASLR or winning a race condition. Each time the service restarts, you get a second chance to attack.
I have done a professional evaluation of a EAL6+ certified microkernel OS. There were plenty of bugs and design flaws (which I can not reveal) and an even bigger problem. To obtain certification, most functionality is left out. The users actually need this functionality though, so they put it in the uncertified code running on the certified OS. The overall result is less secure because each user program drags along a buggy reimplementation of what would normally be OS functionality. BTW, despite the EAL6+ nonsense, they were way behind OpenBSD and even Linux. It was that bad.
I have also been a professional kernel developer for a different microkernel OS. I assure you that maintainability is not a property of microkernels. You poke something here, and it pops out there. Good luck tracing out why, and good luck making any serious changes to the OS. The reason is that microkernels are deceptive. The individual components are simple, but they have very complex interactions. Glue isn't free. Compared to that, even Linux is trivial to understand and modify.
"Consider defeating ASLR or winning a race condition. Each time the service restarts, you get a second chance to attack."
I considered it. Those problems are handled by eliminating that problem with other means. Input validation, pointer/buffer/array protection, and so on are a start. Restarts are mainly for hardware faults or problems from lingering state. The concept was field-proven for reliability down to CPU level by a certain vendor whose systems ran NonStop. Many others at various levels, esp app's. Recently, academia showed it with "micro-restarts" paper cataloging problems that built up at every layer while showing component restarts knocked out a good chunk with imperceptible downtime. One of my own designs leverages what you describe in an instrumented system to automatically taint and trace execution after components restart enough. Idea being the failed attacks will take me right to vulnerability and patching it. This is only on paper but CompSci teams did similar things in stuff they built.
"BTW, despite the EAL6+ nonsense, they were way behind OpenBSD and even Linux. It was that bad."
I keep hearing these things. It wouldnt' surprise me if it were true given how I called out one vendor over mislabeling what was certified and not mentioning extra untrusted code. Forced them to change their website. Probably same as assholes given there's only so many EAL6+ kernels out there. ;)
Yet, what analysis and pentesting I've read of such assurance activities dating back to 60's shows they deliver results. We have even more methods today. Whereas the CVE's and severity I get out of low assurance software are laughably bad. It might be true that modern vendors are bullshitting through evaluations. Says more about evaluation politics than the methods used: they only work if applied for real. I endorse the methods most of all, old and new.
Btw, the latest from CompSci aiming at EAL7+ is seL4 kernel. The source code for that is available. Feel free to find their vulnerabilities and show them where their models/proofs were inadequate. Whatever you find will factor into other efforts. If you find little, that would be a testament of itself, yeah? I'm neutral as I'm interested in what exact metrics will be for a ROI analysis.
"The users actually need this functionality though, so they put it in the uncertified code running on the certified OS. The overall result is less secure because each user program drags along a buggy reimplementation of what would normally be OS functionality. "
I agree with that one on security front. This often happens. That's why I push for standardized, core functionality in them. QNX and BeOS were again great examples there although not designed for high-security. GenodeOS is doing clean-slate stuff and pulling in components from UNIX land. They're security focused. So, there's potential there. QNX could conceivably be redone for real security but it's not likely to happen. This is a social problem more than technical. A real issue with barebones stuff but not fundamental.
"I assure you that maintainability is not a property of microkernels. You poke something here, and it pops out there. "
"The reason is that microkernels are deceptive. The individual components are simple, but they have very complex interactions. Glue isn't free."
Yes, these are totally true. It's why you need different tooling for debugging them. My old technique was modelling the software as a monolith in source with bug prevention or hunting using same techniques as finding concurrency errors in shared thread and/or actor models. You can also use taint-based methods that track things through the system live or virtualized. Tannenbaum and Hansen had some other methods. Quite a few out there in CompSci and industry.
Yet, you are in for a world of hurt if you try to debug them like you do a monolith esp with tools designed for monoliths. I have a feeling that's what you were doing. I'm not saying there's a lot of publicly available tooling plus guides on it where you'd have had it easier. This stuff, like high-assurance vs mainstream, tends to silo up with knowledge getting obscure/lost and tools getting dusty. The tricks are prevention by your resource sharing and/or middleware plus tooling that models and tracks flows in distributed systems w/ easier subset of its assumptions.
We get this complaint enough that I think I'll try to dig up a collection of tools or methods from CompSci and proprietary sectors to recommend or further develop into something widely available. If I can find time that is. Got many projects I'm working on outside a demanding job. It needs to be done, though. No valid excuse for us hearing this in 2016 without a Github link to reply with except bad priorities among microkernel community.
"Compared to that, even Linux is trivial to understand and modify."
You're the first to ever tell me that lol. I've seen many people give up on high-assurance UNIX/Linux, even significant architectural changes, because of too many difficulties. Largely tight coupling and legacy effects. So, they ended up working at hardware, compiler/language, or microkernel levels to solve issues. Managed to get them solved in believable ways. Makes me think Linux wasn't so trivial. Peer review will tell over time if each issue was really solved. Meanwhile, Linux today has most of the problems it had when I reviewed it 10 years ago. More reliable and usable than before, though, with it only hosing my packages and freezing my desktop every few months instead of days. The backups and restores work great, though. ;)
I have done a professional evaluation of a EAL6+ certified microkernel OS. There were plenty of bugs and design flaws (which I can not reveal) and an even bigger problem. To obtain certification, most functionality is left out. The users actually need this functionality though, so they put it in the uncertified code running on the certified OS. The overall result is less secure because each user program drags along a buggy reimplementation of what would normally be OS functionality. BTW, despite the EAL6+ nonsense, they were way behind OpenBSD and even Linux. It was that bad.
I have also been a professional kernel developer for a different microkernel OS. I assure you that maintainability is not a property of microkernels. You poke something here, and it pops out there. Good luck tracing out why, and good luck making any serious changes to the OS. The reason is that microkernels are deceptive. The individual components are simple, but they have very complex interactions. Glue isn't free. Compared to that, even Linux is trivial to understand and modify.