It appears that Gloox, a relative low-level XMPP-client C library, rolled much of its Unicode and XML parsing itself, which made such vulnerabilities more likely. There maybe good reasons to not re-use existing modules and rely on external libraries, especially if you target constraint low-end embedded devices, but you should always be aware of the drawbacks. And the Zoom client typically does not run on those.
One of the harder things with XMPP is that it is a badly-formed document up until the connection is closed. You need a SAX-style/event-based parser to handle it. That makes rolling your own understandable in some cases (e.g. dotnet's System.Xml couldn't do this prior to XLinq).
That being said, as you indicated Gloox is C-based, and the reference implementation of SAX is in C. There is no excuse.
Not only that, but before the TLS session starts you have to handle an invalid XML document (the starttls mechanism start encrypting stuff right in the middle of the initial XML document).
Also some XML constructs are not valid in XMPP (like comments)
I think rolling out your own XML parser for XMPP is a fairly reasonable thing to do. In the past at least, many, if not most, implementations had their own parser (often a fork of a proper XML parser). What is more surprising to me is why would they choose XMPP for their proprietary stuff. I don't think they want to interroperate or federate with anything?
(if I remember correctly and if it hasn't changed compared to many years ago, when I looked at that stuff.)
> One of the harder things with XMPP is that it is a badly-formed document up until the connection is closed. You need a SAX-style/event-based parser to handle it.
That is a common misconception, although I am not sure of its origin. I know plenty of XMPP implementations that use an XML pull parser.
Smack uses an XML pull parser and non-blocking I/O. It does so by splitting the XMPP stream top-level elements first and only feeding complete elements to the pull parser.
I find that response a bit strange, since the whole reason the Zoom client has these particular vulnerabilities is because they didn’t roll their own, and instead rely on layers of broken libraries.
It’s quite possible they’d have more bugs without doing that, but re-using existing modules could just as easily have been an even worse idea.
Using what everyone and their dog is using is prone to bugs just as much because software without bugs doesn't exist or is not very useful, but it also has the benefit of many versatile eyeballs looking at it in many different contexts.
So if there's a bug found and fixed in libxml2 which is used by almost everything else, everyone else instantly benefits. Same with libicu which is being used, for example, by NodeJS with its huge deployments footprint. Oh, and every freakin' Webkit-based browser out there.
OTOH, they rolled their own, so all bugs they hit are confined only to zoom, and are only guaranteed to get Zoom all the bad press.
If they roll their own it also becomes less interesting to actively exploit.
Obviously this doesn’t really work for Zoom any more, since their footprint is too large, but it can stop driveby attackers in other situations. Nobody is going to expend too much effort figuring out joe schmuck’s homegrown solution, where they’d happily run a known exploit against the unpatched wordpress server.
I think the point is that Unicode and XML parsing are known to be security critical components and you should take care that they are handled only by well tested code designed specifically for the purpose. You need to not roll your own and also ensure that any third party components didn’t roll their own.
I get your confusion. But keep in mind that it is not only about just picking the library that shows as first result of your Google search. My naive self thinks that a million dollar company should do some research and evaluate different options when choosing external codebase to build their flagship product on. There a dozens of XMPP libraries, and they picked the one that does not seem to delegate XML and Unicode handling to other libraries, which should raise a flag.
I think that's a false dichotomy; IMO the best default choice is to rely on the most well-tested library in any given category. That suggests to me that they should have used expat on the client side.
IMO we should use external libraries, and should invest engineering time on the library rather than just take a library. Not using good third party library means you need to invest at least a few engineer-month in it to get the same result, and you will need to invest a lot more to do better than third party library. Instead, you can take the library and invest a few engineer month to improve the opensource library.
Why? If anything, the client does the more reasonable interpretation of the XML-in-malformed-UTF-8 - skipping to the next valid UTF-8 sequence start. It's the server that has really weird behavior for their UTF-8 handling where it somehow special cases multi-byte UTF-8 sequences but then does not handle invalid ones.
This is a very common issue across all of software engineering I've found. But I really don't get why. If I was given the task of parsing Unicode or XML, I'd run and find a library as fast as possible, because that sounds terrible and tedious, and I'd rather do literally anything else!