HTML as TeX replacement

dozzie · on Nov 7, 2015

> [...] HTML doesn't give as much typographic control as TEΧ, but when you compare to the full web suite, including CSS and SVG, that conclusion can't be sustained

And the very formula that was a proof that HTML+CSS+SVG is enough shows that it is not enough. In my browser it looks terrible, subscripts and superscripts are mixed, font is uneven, and what not. The formula is simply unreadable, in the sense that I can't decipher it, not merely it's awful to look at.

Moving out of TeX ground to HTML+stuff, you instantly give up plenty of goods that were implemented in TeX well (kerning and hyphenation among the others), you give up consistency of behaviour (font rendering), and in exchange you get source code that looks obnoxious compared to pure TeX or LaTeX.

iamsohungry · on Nov 7, 2015

And to expand on this, it's not even just display failings, it's also semantic failings. There's no way I know of to do footnotes in HTML/CSS/SVG in a reasonable way. I'd want something like:

    <p>Some text.<footnote>Footnote text</footnote> Some more text.</p>

...to be rendered as something like:

    Some text.[1] Some more text.

And at the bottom of the page:

    [1] Footnote text.

But as far as I know there's no reasonable way to do this. There are similar problems for tables of contents, bibliographies, numbered figures, etc.

The idea that HTML is semantic in practice is pretty laughable.

All that said, I'd have a lot of objections to someone writing a "Using LaTeX as an HTML replacement" article. LaTeX is better designed overall, but they're also designed for fundamentally different use-cases.

carlob · on Nov 7, 2015

Funnily enough Knuth was opposed to footnotes (or maybe it was his wife). Only at the behest of some colleagues in the humanities were they later added in TeX.

scott_s · on Nov 7, 2015

Footnotes have their uses [1].

[1] Particularly when you have to say something which is almost, but not quite right, and because you're writing an academic paper, you're both worried that a reviewer will dock you for being so naive, and as an academic it pains you to say something which is not technically correct, but you have this pesky 10 page limit, so you stuff the pedantry in scriptsize font on the bottom of the page.

iamsohungry · on Nov 10, 2015

Eh, don't ever say things that aren't correct. It's not wrong to say that something is a way of approximating or whatever; use your linguistic qualifiers. If you need more space, ask for it, or say less and write a second essay/article/whatever.

Footnotes are much better used for pointing people to further reading material.

epeus · on Nov 9, 2015

The HTML way to do this is <details><summary>1</summary>Footnote text</details> which will inline the footnote and expand it on click; you could move it elsewhere on the page with CSS.

iamsohungry · on Nov 9, 2015

...which is still inferior, because it couples every footnote to every other footnote; if you add a footnote to the beginning you now have to go through and change the numbers on every single one. That's not even considering the issues you'll run into if you want to break up the content into pages.

Also, usefully displaying the footnote elsewhere on the page in a useful way is not a trivial task.

y4mi · on Nov 7, 2015

the html formula consists mostly out of sqares because of missing unicode characters on my windows 10... not really convincing, really

verandaguy · on Nov 7, 2015

    His first example is eiπ = −1. Note how that was displayed fine inline, just by
    using <sup>, which has been in HTML for years, along with <sub> which I
    used to show the TEΧ e. Writing in utf8 means I don’t need a special
    sequence like \pi for π.

... So all that tells me that instead of doing `^{content}` for superscripts I have to use the longer and more painful `<sup>content</sup>`, and that I should always have a Unicode table on hand to copy/paste from for special characters that I can't write using my EN_US keyboard.

There's currently no strong case to use HTML (even with full CSS3 and SVG support) as a replacement for LaTeX, mainly because HTML was made for web typesetting - which is normally focused around shorter blocks of text - while LaTeX was made for people who want to be able to typeset and read documents which are tens and sometimes hundreds of pages long. The two domains just have different requirements.

javajosh · on Nov 7, 2015

>`^{content}` for superscripts I have to use the longer and more painful `<sup>content</sup>`

This is superficial, and one could trivially write a markdown style pre-processor if you wanted the former syntax (and I would be highly surprised if someone hasn't already done that).

>I should always have a Unicode table on hand to copy/paste from for special characters that I can't write using my EN_US keyboard

One way or another, you need such a table. For TeX the table consists of ASCII symbol names. (Note also that unicode characters do indeed have an ASCII representation using character entity syntax, although it's ugly.)[0]

>HTML short...TeX long...The two domains just have different requirements.

What justification do you have for this claim? HTML was designed by TBL for researchers to share papers, and there is plenty of long-form content written in HTML.

TeX is great. HTML is great. They'll probably both be around for a long time. There are trade-offs between them, but not the ones you've identified. (In particular, TeX's great strength is similar to PDF in that it's output is totally stable. Browsers by contrast are notoriously dynamic and won't, in general, produce the same output.)

[0] https://en.wikipedia.org/wiki/Unicode_and_HTML

verandaguy · on Nov 7, 2015

    This is superficial, and one could trivially write a markdown style pre-
    processor if you wanted the former syntax

That feels just a tiny bit circular. It's like if I wrote a huge set of C++ preprocessor macros so that I can write my code in JavaScript, have the preprocessor do its magic and transpile it down to plain C++, and then have it compiled down to machine code.

    One way or another, you need such a table. For TeX the table consists of ASCII
    symbol names.

Not necessarily. It's not difficult to memorize the commands that produce commonly used symbols; `\pi`, `\sum_{}^{}`, `\int`, `\frac{}{}`, `\times`, etc. They're all really, really mnemonic with a consistent naming scheme, and if you don't like a symbol's command, you can always alias it to a new command in the preamble.

I'll give you the last part though, that was my mistake - but I did meant that the environments used to render HTML as well as certain associated standards are shifting towards their more common uses, which are along the lines of websites and blogs rather than research papers.

leephillips · on Nov 7, 2015

The OA is a reply to my article in LWN which was discussed in HN at https://news.ycombinator.com/item?id=10468755 .

I'll write a detailed reply to this reply when I get time; for now, note that Marks' attempt to render Euler's equation in HTML is incorrect: his result has Roman (upright) characters where math italics should be.

leephillips · on Nov 9, 2015

My reply is at http://lee-phillips.org/replyToMarks/ After it appeared he began to scramble to fix the errors in his markup that I found, silently. He's having some difficulty understanding the spec, however.

Symbiote · on Nov 7, 2015

> special characters that I can't write using my EN_US keyboard

I set the "menu" key on my en_gb_dvorak keyboard to be compose. I use it most days, whether to type the Ö in a colleague's name, € on a budget, ² in m², → anywhere, or • (or even ①) in a plain text document.

The file /usr/share/X11/locale/en_US.UTF-8/Compose contains many definitions, but they don't all work, and it lacks π. I'm not sure where mine are read from.

Whoever wrote it had a sense of humour.

  <Multi_key> <L> <L> <A> <P>             : ""  U1F596 # RAISED HAND WITH PART BETWEEN MIDDLE AND RING FINGERS

Edit: It seems HN won't display the character: http://www.fileformat.info/info/unicode/char/1f596/index.htm

JadeNB · on Nov 7, 2015

I also think that one should be ashamed of a typesetting solution that requires so blatantly lying about the semantic meaning as to pretend that the base of an exponential is a subscript.

mixedmath · on Nov 7, 2015

One aspect Of TeX on the web (via mathJax, say) is that the source is meaningful. I maintain that in the not so distant future, someone is going to write a math-search engine (or at least they should), and the key will be interpreting the latex source behind documents and webpages. I do not think this would be possible if we used mathML or raw svg data.

jahewson · on Nov 7, 2015

On the contrary, TeX markup is mostly concerned with presentation and not semantics. For example, in TeX we don't know if f(x + 1) means function application or multiplication by f. MathML however, allows for a complete semantic description of an equation, as well as a presentational description. However, people are generally too lazy to use semantic MathML.

JadeNB · on Nov 7, 2015

> I maintain that in the not so distant future, someone is going to write a math-search engine (or at least they should)

I think this is one of those things that has been in the not-so-distant future for longer than you'd think. See [Mathematical knowledge management](https://en.wikipedia.org/wiki/Mathematical_knowledge_managem...) for a general overview, and [OMDoc](https://en.wikipedia.org/wiki/OMDoc) in particular. This mathematician-but-not-expert-in-this-stuff particularly associates [Michael Kohlhase](https://en.wikipedia.org/wiki/Michael_Kohlhase) with research in the area.

jbssm · on Nov 7, 2015

I don't really get the point of this article. The author claims that HTML is ready to replace TeX and then proceeds in presenting us 2 examples, and those two examples look great in TeX and awful in HTML.

cosarara97 · on Nov 7, 2015

In the SVG Text version of ∫𝝨∇×𝐅∙𝑑𝝨 = ∮∂𝝨 𝐅∙𝑑𝐫, the r is invisible in firefox (at least on my machine). It is visible on chrome.

http://i.imgur.com/GhrHkbG.png

abritishguy · on Nov 7, 2015

No system that attempts to typeset from source in the browser will be able to compete.

It can easily take more than a second to compile a laTeX document with a moderate number of formulas. Sure, some of this could be optimised but not _that_ much - the best you can do is either force the user to wait for you to typeset accurately or show some close approximation that will never be on par with TeX.

SeanLuke · on Nov 7, 2015

> Also, I took out the spaces around the em-dashes that Lee Phillips oddly put in.

Perhaps Kevin Marks isn't an American? What he's doing is restoring a fairly ill-considered British punctuation style.

Let's go with the Brit theory. While the Bris may rightfully lecture Americans on grammar and other features of English, British punctuation is famously, notoriously atrocious. I like the Louis Menand quote (regarding Lynne Truss): "An Englishwoman lecturing Americans on semicolons is a little like an American lecturing the French on sauces."

http://www.newyorker.com/magazine/2004/06/28/bad-comma

LukeShu · on Nov 7, 2015

No spaces round an em-dash is what's correct in American English. (Also, Menand's piece that you cite has no spaces around em-dashes either)

maratc · on Nov 7, 2015

There is no correct and incorrect; it is a matter of the manual of style you're adhering to.

Chicago (2.13): An em dash has no space before or after, unless you're doing some fancy word-replacing maneuvers with a 2-em dash.

AP (p. 368): An em dash, like an ellipsis, has a space before and after, except when used to introduce items in a vertical list.

http://www.apvschicago.com/2011/05/em-dashes-and-ellipses-cl...

Edit: this won't be complete without mentioning Robert Bringhurst in The Elements of Typographic Style:

> The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for use with the best text faces. Like the oversized space between sentences, it belongs to the padded and corseted aesthetic of Victorian typography.

dragonwriter · on Nov 7, 2015

No spaces around em-dashes is what most commonly recommended in American style guides though done recommend thin spaces and some have, or at least accept, seeing them open (i.e., with normal spaces). This is a flexible issue of style, though, more than of linguistic "correctness" (and even the choice between an em-dag and an en-dash set open, for the same user, is a style issue on which different American manuals will disagree.)

scott_s · on Nov 7, 2015

I prefer no spaces around em-dashes for the entirely subjective reason that I find it aesthetically pleasing. It boggles my mind that other people look at an elegant em-dash and think, "Yes... with spaces... now it looks right."

There are some instances when I use spaces - such as in in forum posts or emails, which are far more ephemeral, and I'm not going to hunt around for the actual em-dash character, and I'll use us a space and a hyphen as an uglier, but expedient, substitute.

SeanLuke · on Nov 13, 2015

> I prefer no spaces around em-dashes for the entirely subjective reason that I find it aesthetically pleasing. It boggles my mind that other people look at an elegant em-dash and think, "Yes... with spaces... now it looks right."

I think it's because with no spaces, an em-dash looks like it's connecting two words into a single word.

Name another phrase-level punctuation mark (period, comma, semicolon, colon, quote, question mark, ellipses, parentheses) which isn't connected to a space. There's a reason for that.

[Hyphens, apostrophes, slashes, etc. are word-level.]

DanBC · on Nov 7, 2015

> correct in American English

Wikipedia has megabytes (hundreds of thousands of words) of discussion around when to use m dash, n dash, hyphen, or minus. Some conversations went to arbcom. Some editors probably got bans.

"Correct" is a dangerous word.

SeanLuke · on Nov 7, 2015

Though things are tangled now, American English traditionally has thin spaces on the sides of m-dashes.

peedy · on Nov 7, 2015

One case where HTML won't work as a TeX replacement is when you have to think in terms on a page. For example, if you're making a resume and don't want it go over one page, it becomes a tedious process to "change CCS, Print preview, repeat".

ben0x539 · on Nov 7, 2015

What is the comparable workflow in TeX? Change document, generate pdf, get compile error because the page overflows, repeat?

wodenokoto · on Nov 8, 2015

While it's micro see that html can render a lot of stuff on some devices, I find the comparison to Tex redicoulous.

I have yet to meet a Tex document that on the fly can reformat itself to fit anything from a phone to a large desktop screen, let alone link to documents all over the web, handle banking or playing games.

Tex is really good at getting your print document to look a special kind of way. Html is really good at getting text formatted to your current device, at the loss of pixel (or mm) precision.

And neither format is nice to type your documents in. I guess that's the only place they are comparable :)

jwdunne · on Nov 7, 2015

Hrm. I write a lot of HTML/CSS. I have 0 experience maintaining TeX code. I find the TeX code easier to read and prefer the end result. I'd learn TeX to translate if I ever needed to maintain the equivalent HTML.

I don't actually think you can get a consistent result in HTML/CSS across all browsers without a lot of effort. For me, the HTML version is missing characters and styling.

I think those wanting to see the formulae would be happy enough to wait a second for it to render than be at the whim of browser warts and the loss of information.

pyramation · on Nov 7, 2015

When expressing mathematics, you want to focus on the formulae and underlying principles, not styling and layout. TeX provides a design spec inherently to standardize and provide consistent published mathematical content.

HTML abstracts layout less, and TeX gives you more mathematical creative expression.

I personally like using HTML5 as a rendering engine, and TeX as the "design" spec. I built LaTeX2HTML5.com a while back for this purpose, so I could build all of diagrams and mathematics in TeX, but publish to both HTML5 and paper.

tha_melonballer · on Nov 7, 2015

Should be more like TeX as an HTML replacement.

pcwalton · on Nov 7, 2015

TeX doesn't even support dynamic restyling, or things like flexbox. It works great for article-style documents, especially those containing a lot of mathematics, but let's not wear rose-colored glasses.

mangecoeur · on Nov 7, 2015

I feel in truth it's not quite there, but on the other hand it's still worth pursuing just to escape the insanity that is Tex and it's patchwork of packages to support everything from things-that-should be core (multi column) to oh-god-why-would-you-do-ath (drawing diagrams).

dredmorbius · on Nov 7, 2015

What my experiments with various content creation and markup systems over the past 30 years have shown me is that it's far less details of presentation which are crucial, but in enforcing document structure itself.

Presentation changes with technology -- I've seen and used systems with toggle-and-light outputs, true ttys (paper), glass ttys, various terminal and console outputs, the "standard" 24x80 terminal, desktop GUIs, and now handheld and mobile GUI devices with sizes from wristwatch to ledger. Other displays may be as large as a city block.

And that's just viewable output. TTS (text to speech) and voice recognition are also increasingly present.

The same presentation systems don't work across these. But well-encoded semantic content is amazingly robust. I remember learning of the '-man -Tps' (I think) nroff/groff argument -- that is, apply manpage macros, and format for postscript output. The same manual page markup that is readable in a console suddenly becomes pretty-printed (and created a brief market in the late 1990s / early 2000s for "Linux Bible" manpage dump books). Groff has more tricks up its sleeve, and as the Debian dwww package shows, manpages can be converted directly to viewable HTML.

But groff is grotty. I knew it (or precursors) once, well enough to turn in several Uni essays prepared via it. But those neural pathways have long since eroded.

I'd replaced it for quite some time with HTML, a reasonably versatile structured markup system, particularly given that most of what I was writing was intended for online Web publication at some point.

In the past few years I finally cracked the Lion book and started using LaTeX. I'd realised what the blocks were to my earlier attempts (ironically, "easy-to-use" tools such as Lyx actually had gotten in the way), and discovered that it was, as an authoring tool, often far lighter than HTML. Double carriage returns as paragraph breaks replaces seven discrete keystrokes per paragraph. Other constructs are a bit less lightweight, but remain clear.

But the real win is in how LaTeX is both a structured and validated document format. Screw up your HTML somehow, or follow some vendor's proprietary extensions, and a browser will say "eh, close enough". Omit a closing brace or backstroke or dollar sign, and your LaTeX compiler will scream at you until it's blue in the face. Unsettling the first few times it happens, but you begin to realise it's right.

The Web is an error condition: http://deirdre.net/programming-sucks-why-i-quit/

But deeper than that, LaTeX offers structure.

Documents have titles, and authors, and publication dates. They quite frequently have references and bibliographic citations.

Text notes -- foot, side, end, whate'er -- aren't some fucking foreign bolted-on concept.

http://codepen.io/dredmorbius/details/OVmBaZ

There are other bits and pieces that are missing from both, and LaTeX, not principally oriented to online publication, is short on stuff as well. But then, HTML doesn't have a native concept of reputation-ranked, hierarchical, collapsable comment streams either. Despite Usenet's prior art staring it gloweringly in the face for 25 years.

And yes, LaTeX as a direct authoring environment has its downsides. I've actually taken to preferring Markdown for my intial pass through writing documents, and it's a langauge which can readily be front-ended by the GUI-friendly tools most writers will want.

And HTML5's semantic structures and the valient attempts by some (see Readability's Developers section and the hNews microformats specs) are actually pretty cool. But until and unless someone steps in to require validated content before they'll pass it (and "someone" tends to be spelled "major search engine", which is spelled "Google" -- who have actually stepped in to police some standards of online behavior and presentation), we're stuck with the fact that crap HTML is still blindly accepted.

But if anything, front-ended by Markdown or other simple markup languages, it's HTML that should be replaced by TeX.

JadeNB · on Nov 7, 2015

I know this is a trivial response to such a thoughtful post, but what is "the lion book"? Do you mean The TeXbook (http://www-cs-faculty.stanford.edu/~uno/abcde.html)?

dredmorbius · on Nov 8, 2015

No, though that's also worth buying.

Leslie Lamport's LaTeX: A Documentation Preparation System

http://www.powells.com/book/latex-9780201529838/61-1

It really is quite simple.

I set up a couple of templates for articles and books, and frequently re-tag either straight text or HTML into LaTeX when I'm frustrated with existing presentations.

JadeNB · on Nov 8, 2015

Thanks! There's no need to convince me of the utility of (La)TeX; I am a mathematician. I just hadn't heard the term "the lion book" before.

Avshalom · on Nov 7, 2015

Of course if you include SVG we can eventually create TeX quality... because SVG allows us to draw arbitrary images. If we're going to be using SVG we don't even need HTML and CSS.

mbrock · on Nov 7, 2015

I've had pretty good experiences with translating LaTeX into SVG via DVI. The tools for this come with TeXLive and they're quite fast.

Kristine1975 · on Nov 7, 2015

Hyphens and ligatures are not all modern incarnations of TeX can do. pdfTeX for example also supports margin kerning and font expansion.

Mimick · on Nov 7, 2015

Hmmm, why not take ideas from HTML/CSS and that's all for a next version...

jzd · on Nov 7, 2015

HELL NO.

If you need to, generate HTML et. al. from TeX. But HTML is not a substitute for TeX. Forget it.