Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

parsing-notes.mdwn

Parsing

I’m opening a page discussing the whole topic of parsing and serializing of core languages (Kort, Idan) and any future domain specific ones (including the query language), to talk about the issue with parser generators and mention Ecliose Xtext etc. and the Chomski virtual machine and the DCGs and their use in Prolog, Ciao and Mercury.

If writing them in Mercury is faster and easier than Bison, and/or the code runs faster, I can even use that instead of Bison, even if Mercury does require compilation of the code. I can also test with Prolog (or Ciao), which doesn’t.

Note that Kort can work with just Flex, i.e. lexer, i.e. just regular expression matching. I can even try for practice to write my own code, especially if there’s no existing tool for this purpose. I mean, there’s Flex, but I need to see exactly how it works. There’s also Perl’s regex substitution with numbered parts, which can probably work too.

Idea: The model in which an Idan file is represented as a tree doesn’t necessarily work for any language, e.g. some language may group based on the predicate. In Prolog it’s usually the case. But the generic quad-table form is not efficient (e.g. finding all properties of some object takes O(n)). My idea is to allow to have a several in-memory models, and each language should specify parsing into one of them, and serialization from one of them (not necessarily the same one, although it would probably be the case). Then, each model can specify translations to/from other models, and the conversion process involves doing some sort of DFS in the models-and-languages graph to find a way to do the requested conversion. Of course since the number of models would likely be very small this is not a problem of performance, but anyway since not every day a new model/language is made, it should be possible to generate a neighbor matrix form of the graph or a transitive closure of it, and store it as a cache, which would allow the how-to-convert-X-to-Y queries to be answered in O(1) because a table already exists with a cell for-each-possible-X-Y-pair.

Consider writing this in Haskell (or another functional language) for learning and for variety. Actually, since we’re talking about a graph, a logic language can work too! Just somehow load the data from file.

This would need a whole new Partager component. Maybe it’s time to revive names like Frelsi and Cist? I’ll also need a name for the language design system, if the code I write on top of existing things (Xtext, chomski, etc.) is enough to be a separate component. Maybe Sprak and Strom and still unused too :-)

In general, prefer GPL over LGPL over permissive. It helps surround the project with tools committed to freedom! And it’s compatible with CC0. And prefer things that create C/C++ code. For PEG, prefer recursive-descent.

Links:

[[!wikipedia Comparison of parser generators]]
[[!wikipedia Flex lexical analyzer]]
[[!wikipedia Quex]]
[[!wikipedia Ragel]], Ragel
There’s also lemon and re2c
[[!wikipedia Chomski]]
[[!wikipedia Ciao (programming language)]]
[[!wikipedia Mercury (programming language)]]
[[!wikipedia Definite clause grammar]]
http://adventuresinmercury.blogspot.com
http://www.eclipse.org/Xtext
[[!wikipedia JetBrains MPS]], http://www.jetbrains.com/mps/
http://stackoverflow.com/questions/212900/advantages-of-antlr-versus-say-lex-yacc-bison

[See repo JSON]

Clone

Branches

Tags

parsing-notes.mdwn

Parsing