An interesting article about parsing techniques that were developed half a century ago when there were still severe memory limitations. It would be nice if it at least would mention that these techniques were developed because back-tracking was not considered as a valid option for parsing, because it was simply not feasible to store a whole file in memory. Nowadays the whole source code of the linux kernel easily fits in RAM.
For many applications, using a back-tracking parser with some memorization/caching, is feasible. IParse Studio [1] is an example of an interpretting parser that parses a grammar and an input at each change on the input and demonstrates that it works for rather complex grammars [2] on realistic input [3].
But it seems the LR approach is just not flexible enough for my needs, so I am experimenting with replacing it with parser combinators + Earley parsing. The LR approach has also the disadvantage of having to precompute tables, which is fine for a single file, but if you want to deal with modules, then I am not sure how to compute these tables incrementally, which is kind of a prerequisite for a good user experience.
"I suspect that parsing performance worries date back to the period when parsing techniques were under heavy development. LR parsing was invented in 1965, a time when computers were painfully slow [19] and resource poor. "
"When combined with a couple of other techniques to squeeze the statetable’s memory footprint [22], even the most puny modern machine can run an arbitrary LR parser at impressive speeds."
There is no need for full memoization. The above JavaScript implmentation caches for each non-terminal the last location a parsing attempt was made and the out come of that: either false, or true with an abstract syntax tree and the next location after the part of the input that was parsed.
For many applications, using a back-tracking parser with some memorization/caching, is feasible. IParse Studio [1] is an example of an interpretting parser that parses a grammar and an input at each change on the input and demonstrates that it works for rather complex grammars [2] on realistic input [3].
[1] https://fransfaase.github.io/MCH2022ParserWorkshop/IParseStu... [2] https://fransfaase.github.io/MCH2022ParserWorkshop/C_grammar... [3] https://fransfaase.github.io/MCH2022ParserWorkshop/scan_pc.t...