Hacker Newsnew | past | comments | ask | show | jobs | submit | italovignoli's commentslogin

OnlyOffice is proprietary software, or freeware if you prefer (they modify the AGPL licence with clauses which are not compliant with the OSD), and by natively supporting the proprietary Microsoft Office document format locks in their users to Microsoft (the company controlling the closed and proprietary format). True free software advocates should avoid it.


DOCX files created by LibreOffice are not only smaller but simpler in term of XML structure, and easier to interoperate. A two-page file created from scratch is 1100 XML lines if written by LibreOffice and 11500 XML lines if written by Microsoft Office. The 10400 redundant XML lines are there to make it difficult to properly read the file. Also, they may contain non-standard elements which have been deprecated before the approval of the standard itself but are still there after 12 years.


I'm not sure that "to make it difficult" is true.

MS Office is very old, from their point of view compatibility will always have to mean it works as much as possible like the previous version.

So suppose you have a three-way condition clause in some code, each paragraph has either "Straight", "Fast" or "Thom Hopkins" formatting. Hmm. During XML standard writing you ask engineers to explain these options so you can write them up for the standard.

"Straight" and "Fast" turn out to each have six paragraph definitions. Great! Write those in the XML standard. The guy you asked to work out "Thom Hopkins" has gone on sick leave due to a mental breakdown, he has left a forty page document, which includes excerpts from several multi-page C++ classes, one of which seems to be a partial implementation of a bin packing solver and another involves regular expressions.

You find the supervisor of the guy who last worked on the original "Thom Hopkins" code. He explains it was developed over 15 years by a large team and was originally a core part of the document engine before the invention of the faster "Straight" paragraph mode sidelined it.

Now, you _could_ add all this crap to an appendix of the proposed XML document standard, and watch a committee vomit when they try to read it OR you could say "Thom Hopkins" is a special mode and shouldn't be used in standards compliant documents, even though it's actually used in millions of templates for your own popular office suite. And then people will say you did it just to spite them...


I mostly agree with you, I subscribe to "don't attribute malice where you could attribute incompetence or unforseen factors" but I keep wondering in the back of my head why they would keep all this cruft in the docx format. They hard forked their document format in 2007 and caused a LOT of headache back then, why not take the opportunity to streamline all this stuff, you know? Why not remove "Thom Hopkins" in the case of your story?


You are probably not aware of the issues related to Microsoft Office files, which are intentionally bloated with useless XML contents to make interoperability almost impossible. A cleaner XML improves interoperability, even if you do not think it helps. The reality is that until people will consider Microsoft Office files as a reference, anything else will fail WRT interoperability because those files are developed to kill interoperability.


> which are intentionally bloated with useless XML contents to make interoperability almost impossible.

That's just a conspiracy theory. The reason they're "bloated" is because Microsoft Office is optimizing for interoperability with its largest competitor: older versions of Microsoft Office.

Maybe Microsoft Office having "cleaner" XML would improve interoperability. But as long as Office is the standard, the ability to consume messy XML is worth more than the ability to emit clean XML.


> That's just a conspiracy theory.

The "conspiracy" is exceedingly well documented, as others have already noted.

The motherlode is these pages of contemporary documents at Groklaw:

http://www.groklaw.net/staticpages/index.php?page=2005121615...

http://www.groklaw.net/staticpages/index.php?page=2008071923...

A good starting document is "Can Other Vendors Implement Microsoft's Office Open XML?" http://web.archive.org/web/20070912014933/http://www.hollowa...


> A good starting document is "Can Other Vendors Implement Microsoft's Office Open XML?" http://web.archive.org/web/20070912014933/http://www.hollowa....

Then let's start there! Let's start with the first section about Word Processing, in fact:

``` 1.1. Historical Compatibility

OOXML contains compatibility markers to describe older legacy documents, their quirks and processing models. These compatibility features mark behaviours that software must implement to correctly display and process the majority of documents in existence.

The "Compability Settings" WordProcessingML4 section within OOXML does not provide for repeatable practices. While it provides Microsoft the ability to store information related to various behaviors in their legacy file formats, the specification merely lists the names of these settings without proper definitions. An OOXML-consuming application, presented with a document using these attributes, will be unable to interpret them properly and render the page in a high-fidelity manner. Further, since these attributes are merely listed but not defined, the ability to practice the benefit of being “fully compatible with the large existing investments in Microsoft Office documents” (the goal of OOXML according to its authors) is consequently reserved for Microsoft alone.

These behaviours such as “autoSpaceLikeWord95” , “useWord97LineBreakRules” and “useWord2002TableStyleRules” are not defined. As OOXML repeatedly states, [t]o faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard.

These processing hints in the proposed standard depend on undisclosed information, and therefore other vendors cannot correctly process historical documents using OOXML. This lack of specification has significant implications for the New Zealand public sector organisations operating under the Public Records Act who are seeking to preserve documents of their records in a readable electronic form. ```

I think that rather supports the claim that backwards compatibility is a problem for OOXML. I see no claim in there about any deliberate obfuscation.


> These behaviours such as “autoSpaceLikeWord95” , “useWord97LineBreakRules” and “useWord2002TableStyleRules” are not defined.

Amazing that these things can be part of a standard, it even has the propietary names(Word97, Word2002), it's just pure malice to propose that horrible software functionality as a standard. I guess the 6k pages really made it hard to properly review it.


You are mistaken. The tricks they used to get OOXML standardized[1] leave no room for doubt that they have been intentionally making interoperability harder.

[1]: Wikipedia has an article about that: https://en.wikipedia.org/wiki/Standardization_of_Office_Open....


You've linked to Wikipedia but there is no evidence there supporting your claim. In fact the criticism section includes pro-ODF supporters claiming the exact opposite:

> The ODF Alliance UK Action Group has stated that [...] the Office Open XML file-format is heavily based on Microsoft's own Office applications and is thus not vendor-neutral

If you've ever seen the specs for the old, binary office formats (they can be obtained) then you'd know that they are very complex indeed and that their OOXML siblings are pretty much direct encodings of the same data structures with some adaptations for the limits of XML. There is no credit to the argument that Microsoft deliberately made the OOXML office formats complex compared with the existing binary formats.

It's true that Microsoft pushed hard to get OOXML through the ISO, but the reason for that is clear: they wanted an open standard that was 100% compatible with existing Office documents. Something like ODF which lacks many of the features of Office would not do. It also makes their developer's lives a lot easier if they can specify a standard which describes their software's current behaviour. This is exactly what what Adobe did with the PDF ISO standard (1000+ pages) and nobody complains about that.


You are replying to a claim I did not make, that OOXML was not based on Microsoft Office XML or that the XML formats were made more complex than the binary formats.


Anecdotally, Libreoffice has better compatibility with older versions of MS Office than current versions of MS Office, so I'm not sure you have any more evidence for "the reason" than the person you're replying to has for the "conspiracy theory."


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: