Home → Repo ^yEzqv → Branch master Files → projects → idan → manual → 02-literals → 04-chars.mdwn

Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

04-chars.mdwn

[[!meta title=“2.4 | Characters”]]

Characters

Any Unicode character is a valid Character value, but not all Unicode characters can written as-is directly in the Idan document. There are three notations for writing Character literals, which together cover the whole Unicode and provide some flexibility: single quotes (') “normal”, triple quotes (''') “special” and backticks (`) “named”.

Single Quotes

One way to write character literals is by enclosing them between single quotes ('). For example, 'A' is the uppercase A character value, and ' ' is the space character.

It is also possible to place an escape sequence between the quotes. These sequences allow to refer to characters using a notation composed of common ASCII characters. A sequence always begins with a backslash (\). There are two types of sequences: symbolic and numeric.

See below the sections which explain and list the sequences.

Many visible UTF-8 characters can be specified as-is between the quotes, but not all of them. The characters allowed are all Unicode characters which have Unicode basic type Graphical, except for ' and \, which must be specified as escape sequences.

Triples Quotes

Character literals can also be specified between triple quotes ('''). For example, '''A''' is the same as 'A'. The difference is that escape sequences aren’t allowed (i.e. only as-is characters), and spaces can surround the character itself. For example, ''' A ''' is valid and the same as 'A'.

There can be any number of spaces on each side, including zero, and they don’t have to be the same number. The spaces, if present, must all be the ASCII space.

The triple-quoted form exists for the purpose of specifying characters with special visual appearance, in particular non-spacing ones which appear on top or above other characters. For example, there is a right arrow character U+20D7 used in math to denote vectors, e.g. a⃗. Putting that arrow alone between single quotes would look like this:

'⃗'

Using triple quotes, it becomes much more readable:

'''   ⃗  '''

Backticks

In Unicode, each character has a name. These names are unique and immutable. Aliases can be added, but once a name is assigned it won’t be removed. Based on these names, Kadma provides localized sets of character names. The English names are based on the Unicode names, and translations are made specifically for use under Smaoin. Using these names, a character value can be specified by its human-friendly name.

The names are case-insensitive and ASCII spaces are equivalent to ASCII underscores when comparing names.

A named character literal is written enclosed with backticks (`), optionally followed by a language tag (if not, the language set in the header is assumed). For example, the character 'A' can also be expressed using backticks like this in English: `Latin capital letter A`. And the vector right arrow mentioned in the previous section ''' ⃗ ''' can be expressed as `Combined right arrow above`.

A character can have more than one name. The names taken from Unicode are usually based on the character’s appearance and not on its role, because a character can have various roles (e.g. < is less-than in math and also starts a tag in XML and a Uid in Idan). Kadma allows adding more names, and a character may have several names referring to its various uses. For example, < could have names like “less than” and “opening angle bracket”.

[See repo JSON]