Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

Tags

master ::

grammar-notation.mdwn

This is an EBNF notation, used for defining grammars in the Rel4tion project. It is based on several sources, especially the EBNF article in Wikipedia and the notation used by W3C for XML and other languages.

In case you need to refer to it, you can call it SGN. It means “Some Grammar Notation”. Indeed calling a specific notation “some” is a paradox. That’s why the name was chosen :-)

Highlighting

I’m working on a [[Vim syntax file|sgn.vim]]. It’s not complete yet, but useful.

The wiki itself doesn’t yet highlight SGN, but I’ll see if it’s close enough to e.g. EBNF. That would just be a workaround until I write a highlight file of course.

Using

When writing a full grammar definition for some language, create a file in the wiki with .sgn extension. You can treat it as plain text, but SGN comments may contain ikiwiki links, directives, etc. That page can then, if needed, be inlined into other pages. Or just linked.

If writing just a small piece which doesn’t need its own page, use the [[/ikiwiki/directive/format]] directive. There’s no “sgn” language right now, and I haven’t tested what happens if specified. The safe default for now is either using a code block (i.e. indenting lines with a tab or 4 spaces) or a txt snippet (the wiki can render pages from plain-text .txt files):

\[[!format txt """
nesting   = nestopen | nestclose
nestopen  = "["
nestclose = "]"
"""]]

Rules

The grammar is a list of rules of the form

symbol = expression

The expression list may be contain indentation. The indentation is there just for readability, and doesn’t add any meaning. It is a flat list of rules.

Both the alphabet of the grammar and the alphabet of the language it defines are Unicode.

It is possible to specify symbol contexts, and context changes. These are used by the parser (syntactic analyzer). A symbol’s context is specified like this:

context:symbol = expression

For the default context, just the symbol part is enough.

Context change can me specified regardless of whether a rule symbol has a specified context or not. It has the following form:

context1:symbol = expression => context2
-- or
symbol = expression => context

The default context can be specified as :. For example:

exp:closinparen = ")" => :

Sometimes the context change depends on more than just the rule. Maybe the parser holds some information and decides based on it. You can either specify context changes in the lexical structure or in the syntax definition. In the lexical structure case, computed context changes can be denoted like this:

symbol = expression ?=> context

Or a list of possible contexts can be given:

symbol = expression => context1, context2, context3

Then you can use a comment to explain how the choice is made.

The expression on the right side of the rule may be built using the following forms:

/ some text here /

A free-form explanation of the match.

"some text here" or 'some text here'

Exactly matches the content of the string literal.

\xN

Matches the Unicode character whose number in hexadecimal is N.

[0-9], [a-zA-Z], [\xM-\xN]

Matches any character in the specified range(s), inclusive.

[xyz], [\xM\xN\xP]

Matches any character in list.

[AB]

Matches A or B, where each is a range, a character list or a mix.

[^A]

Matches any character which the range/list/mix A doesn’t match.

X | Y

Matches X or Y (alteration).

X - Y

Matches any string that matches X but not Y.

X Y

Matches X followed by Y (concatenation).

X*

Matches zero or more consecutive repetitions of X.

X+

Matches one or more consecutive repetitions of X. In other words it’s the same as X X*.

X?

Matches X or the empty string, i.e. 0 or 1 occurences of X.

X #N

Matches exactly N repetitions of X.

X #M-N

Matches between M to N repetitions of X inclusive.

!X

Matches a string if it doesn’t match X.

( X )

Matches X. Can be used for grouping to change override precedence rules.

-- some text here

A comment, isn’t a meaningful part of the rule.

Order of precedence, highest to lowest:

  1. X*, X+, X?, !X
  2. X Y
  3. X | Y, X - Y

It’s possible and sometimes very useful to indent rules. For example, a grammar can have several “top level” kinds of forms, and the rules for each one can be indented. It doesn’t affect the meaning, but it makes the file more readable.

A line indented to the position of the = after the rule name (or further) is considered part of the rule, while a line indented less is a new rule.

The recommended indentation level width is 2 spaces.

For example, this is a single rule:

[[!format sgn """ literal = number | string | boolean | character | chunk | pattern """]]

The last | in the first line could instead be placed in the second line, right below the =.

But these are 2 rules, the second being indented:

[[!format sgn """ literal = number | string | boolean number = [0-9]+ """]]

[See repo JSON]