Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

Tags

master :: projects / wiki /

markdown.mdwn

Usage of Markdown in the Wiki

Markdown is not an official standard, and there are many many implementations, and many different extensions. That’s not all: The implementations are not compatible, i.e. there is a long list of Markdown features which different implementations parse in different ways.

I’m going to need Markdown for 3 things:

  1. The project wiki I will run
  2. Files in the semantic file system based on the Kiwi models
  3. Working with the content stored by a Kiwi file system, both locally and on a Kiwi-based wiki platform

For 1, using the Markdown converter provided by the wiki software is fine. The conversion doesn’t change the Markdown sources, so I can always switch implementation of there are problems.

For 2, whatever is installed on the user’s computer can be used. Once the file system is in place, any GNU/Linux distribution can be installed and any software can be used.

For 3, I need to have precise rules and tools. This is where the work is. I’ll need my software to support UTF-8, support creation of RTL webpages from the Markdown sources, support non-standard features conditionally, support converting in a modular manner to and from other formats, support security features to avoid allowing users create HTML files with embedded JS code which does bad things or XSS attacks and so on.

First, here is an important design rule to follow IN THE ENTIRE PARTAGER PROJECT:

Always prefer the most reusable and modular solutions, and don’t worry about optimal performance: Computers become faster and faster all the time anyway. Performance becomes an issue only when the software is too slow to solve the problem is was made to solve.

The meaning of this when applied to the Markdown issue is:

  1. Use a general modular anything-to-anything document conversion tool
  2. Try to use a Markdown parser based on a PEG grammar with callbacks, so it can easily be reused and extended
  3. Supply an XHTML writer as well as an HTML writer

Things I need to do: Write a document explaining how to write Markdown in this wiki. But first choose a Markdown syntax for this wiki. So here it is: I’m going to start with the original syntax described by John Gruber. However, I will define some extra rules.

Plan

Parser Configuration Block

First, I want my Markdown files to support features conditionally. For example, some text files are limited to 80-character lines, so line breaks can be optionally ignored when converting to HTML. But in texts which aren’t, ignoring them may not be wanted. This should be controlled inside the file or using commandline options or both.

In addition, it should be possible to control the features from an external configuration file. This will allow to specify them once for many files, and control them in a central place. And it will help keep the file cleaner by replacing a long list of features with a reference to the external file.

This concept is general, i.e. it can be used with any type of text file. I’m giving it the working title “Parser Configuration Block”, or PCB. It’s going to work as follows.

A PCB always appears at the beginning of the file. Always. It can change for individual types of files, but generic handlers not aware of that will always look for it at the beginning by default. Each file type has its own syntax for the PCB, depending on the content syntax, e.g. how comments are marked in the file.

By default a PCB should be a comment, so tools not supporting it will always safely ignore it. Of course some syntaxes may not support comments, but the PCB still appears at the beginning of the file. The comment syntax is the same one of the file syntax, i.e. it is a regular comment. However, a plain regular comment is dangerous: a file not using PCB may have a comment at the beginning, which happens to be a valid PCB, and it may cause invalid interpretation or even a security risk (by e.g. enabling JS code in generated HTML). Therefore, PCBs can be marked with a special sequence.

If such special marking is wanted, it can appear either just at the beginning of the comment (and then the PCB is the whole comment) or both the beginning and the end of the PCB (so the comment can also contain regular non-PCB text). Each file type will have a default setting for this, but it’s possible to specify this on the command-line and/or as a PCB variable. However such a variable can only control the end marker, not the beginning marker.

Every PCB thus begins with:

  1. Optionally a character sequence marking the beginning of a comment
  2. Optionally a character sequence marking the beginning of the PCB

It is possible to specify in the PCB the syntax used for it. It allows each file type to have more than one possible PCB syntax. For example, a Perl script can then choose between a YAML-style PCB and a PCB written in Perl itself (e.g. Gitweb’s configuration is Perl code). This name declaration may be mixed with the PCB marker, or the marker can be separate.

NOTE: It may be useful to allow the PCB marker to appear not at the beginning, but somewhere later in the comment, so that other syntax requirements can be fulfilled. For example, languages which already require or commonly use a special comment at the beginning can still or, and/or have an easier transition to, PCBs.

PCB in Markdown

Markdown supports embedded (X)HTML. I will read later about the exact syntax of comments in HTML and XHTML, but for now let’s assume a comment begins with <!-- and ends with -->. Therefore, a PCB in Markdown will always look like this.

Now, an optional PCB marker must come. I need to choose one to use with all my Markdown, and preferrably make it something other people can use too, because I have to choose one and use it as a standard. How about this:

<!--%% the actual PCB comes here -->

Assume you also want to specify you’re going to use the YAML-based PCB syntax. Then it looks like this:

<!--%%yaml the actual PCB comes here -->

And if using the end marker, it looks like this:

<!--%% the actual PCB comes here %% regular comment text here-->

The end marker doesn’t technically have to be identical to the start marker, but it’s probably better if it is. Actually it can make sense to use a separator (say, at least three consecutive dashes) for this. It works in single-line but looks even better in multi-line:

<!--%% the actual PCB comes here --- regular comment text here-->

<!--%% the actual PCB comes here ------------------------------ regular comment text here -->

What about the syntax of the PCB content? I’d like to use something based on YAML, but first I’d like to suggest my own syntax and make a list of features it has, which I can then replicate using YAML.

In this syntax, the PCB is a list of variable assignments of the form ‘x:5’ or ‘+x:5’ separated by whitespace, i.e. spaces or tabs or newlines. The first form is used for user-defined variables, while the second one is used for standard variables.

A PCB is parsed sequentially, and every re-assignment of a variable overrides its previous value. It works the same for external configuration files: When a file is referred to in the PCB, its contents are read and variables specified in it are assigned in the same way, as if they were inline inside the Markdown file.

The ‘+’ is used to create a separate namespace for standard variables, so variables widely accepted and/or standardised can be moved into the ‘+’ namespace. I can think of one such variable, +import, used to bring settings from external files.

NOTE: Consider reversing it, i.e. the ‘+’ will be used for non-standard things.

The Markdown Itself

I will use standard markdown as defined by John Gruber, but attention must be paid to the small details of the text, so that XHTML rendering works as it should.

When doing a manual line break, i.e. one which should be done in the rendered text, the line must end with at least two consecutive spaces (TODO go to the website and make sure), i.e. just finish these lines with two spaces before the newline.

When using identifiers from source code whose names cause strange things, such as identifiers containing _ or __, enclose their names inside a code block, e.g. this_is_a_function, so their are both highlighted here and rendered correctly into XHTML.

N E V E R use embedded (X)HTML unless there is a VERY GOOD REASON. Only the PCB COMMENT is allowed.

[See repo JSON]