Well, as the author of ANTLR, I will disagree with your assessment of the tool a...

bmn__ · on July 31, 2018

> That quote was […] using a tiny bit of hyperbole.

Then you certainly won't have a problem changing the wording to make it true, right? The Owl author explains the limitations of his software up-front, why don't you?

> It never claimed to be a general tool.

The documentation does give the impression.

> If you take a look at our academic paper

Consider that HN is populated by craftsmen and businessmen, not scientists. We judge the tools on how well they work in practical terms.

> One never knows when they will hit a landmine and the parser takes overnight to parse a file.

No wonder, I see it's O(n⁴) for ALL()/ANTLR. You need to keep up with the current state of the art, which is O(n³).

> It took me 30 years

Time and effort does not count for anything, the quality of the end result does.

> I'm not sure […] your understanding of the parsing landscape [is accurate]

I can see the practical results of the various software I tried. Does the software get the task done? Yes, fine, that's a candidate for further consideration. No, fine, I can eliminate it. Documentation does not say…? Now one has to come up with experiments to find out the flaws and limitations, multiplied by every single user. That's a waste of people's time.

> When was the last time you wanted to parse a language where the same syntax meant two different things? […] ambiguity is almost always an error.

That was 2014, and I did not design that language. It does contain ambiguities, it can't be helped. No one gains anything by simply wishing it weren't so; there was a task to be done. There exists software that can cope, you should take that as an incentive to improve yours.

> but the speed and ambiguity issues are not something I care to deal with.

… or don't improve. But why would you refuse to? Is not your reputation at stake?

> You also mischaracterize ANTLR's handling of left recursion. […] It does not handle indirect left recursion.

That's what I meant. I was incorrect when I wrote "any grammar that's left-recursive".

ThePhysicist · on July 31, 2018

It always amazes me that so many "famous" people still take time to read HN and respond to questions (even insulting ones)! I used Antlr4 in the past and have to say it's a well-designed system, though at the time (2016) the Python runtime which I was most interested in was not very stable.

I've implemented several parser generators myself (nothing as polished as Antlr) and worked on a "rapid prototyping" parser generator tool a while ago (https://github.com/adewes/parsejoy). The idea was a tool that could creates parsers on-the-fly without requiring compilation of any code, and let the user control and modify parsing logic using Lua if necessary. I started writing it in Go but switched to C++ later in order to get more predictable performance.

In the tool I implemented several parsing strategies, notably "parsing expression grammars (PEGs)" as well as a GLR parser (based on the Elkhound paper). And while I find GLR parsing powerful it's not without problems as e.g. errors are hard to debug. That said, it's pretty amazing that you can throw almost any grammar at a GLR parser it and it will (usually) be able to parse it. As you said though writing a "bad" grammar can yield exponential time complexity on some input strings.

PEGs are also nice in general but require you to put more work into the grammar as you are basically required to resolve ambiguities yourself using the "and" or "not" operators. Also, a naive implementation of a PEG parser will have horrid performance as it basically evaluates all possible alternatives via backtracking until it hits a valid one, and for a real-world languages (I've implemented e.g. a Python parser for testing) this will ruin your performance due to the large nesting depth of the rules (Python has around 8-12 levels of rules I think). Using a prefix tree can alleviate this though, as can packrat parsing, which comes with its own cost though as remembering parse results requires memory allocation.

Anyway, in terms of popularity and number of actually implemented grammars, Antlr4 is probably by far the most relevant OS parser generator out there. Thanks for writing it :)

One suggestion I have is to consider improving the tooling for tokenization, as it's often a problem that can be quite challenging in itself. For example, tokenizing Python code is not easy as it's not context-free (due to the indentation that indicates the nesting) and the presence of several types of newline characters (those that occur inside bracketed expressions vs. those that occur outside of them)

parrt · on July 31, 2018

Howdy. Agreed. It'd be nice to have a simpler "match x if NOT followed by y" then and something to handle context-sensitive lexical stuff like Python. I often just send all char to the parser and do scannerless parsing. :)