Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

Tags

master :: projects / idan /

tutorial.mdwn

This is a friendly tutorial which explains step by step how to express information in the [[Idan]] language and write whole Idan documents. It’s meant to be read sequentially, and each section builds upon what was introduced in preceding sections. There is also a [[./manual]] which can be used for reference while reading the tutorial.

IMPORTANT: This is still a draft! It’s not reliable yet!

Table of Contents:

[[!toc levels=4]]

Intro

Idan is a language for expressing information. It uses the semantic information model [[/projects/Smaoin]], with which you should be familiar first. If you know what classes, properties, statements and triples are, and/or know RDF, it should hopefully be enough. In any case, Smaoin has its own documentation which you can browse.

When you learn a programming language, you can try compiling and running things in parallel to reading explanatory text. It makes the experience much more interactive and fun. It’s not boring like just reading can sometimes be. This tutorial aims to to the same for Idan, although it’s an information language, and not a programming language (to be precise, it can describe computations for computers to execute, but it would be inconvenient to write them because it’s a general purpose information language, not specific to computation instructions like programming languages are).

Idan files themselves don’t execute anything, but they can be inserted into various kinds of programs. For example, a program can display a chunck of information as a colorful graph. A program can find errors in your code. A program can detect repetitions, missing parts or common beginner mistakes and pitfalls, and report them to you. It can do all kinds of manipulations on the information. And it can allow you to practice, do the things you’ll really do when you start writing things in Idan, much like a programming tutorial should explain how to use a compiler.

At the moment, there are no such programs. They are under development. It means that right now, the best practice to do while reading this tutorial is probably writing your own small file, and revising it as you go. Use new features you discover, add things you didn’t know how to write, replace sections with better ones and so on.

The current plan is for the Idan parser, written in [[/languages/Haskell]], to be the first tool available for testing and examining Idan code. This tutorial will also serve as a reference for the language’s features, until an up-to-date formal definition is written (there are just pieces and notes right now, because Idan itself isn’t yet finally defined).

Quick Start

If you want to modify an existing Idan file, you don’t go over this entire tutorial now, especially if you want to make just a small change. Look at the your file and learn from the syntax forms it uses. Look at other files, such as the ones defined for Smaoin and the [[/projects/Kiwi]] ontologies. If you see something you don’t understand or you need more information, go to the relevant section of this tutorial and read what you need.

This way, you’ll learn the language in parallel to using it, and eventually you’ll know all the basics and the practices. But you should eventually go over this tutorial anyway, at least the less obvious chapters, to learn about the small details and conventions and recommendations and tips and advanced or less often used features.

If you want to write your own files, do open some existing ones to use as a template, but it’s also recommended you read the tutorial and go back to it when you find unfamiliar features in Idan files.

As said above, there’s no software to experiment with yet. So just use existing files and the suggestions above. When software is written, this section will explain how to quickly set up a work environment, write a simple file and test it.

Statement Basics

Intro

Idan’s role, considering how [[/projects/Smaoin]] works, is to allow you to write statements in a convenient way that is easily readable, writable and editable by humans. There are many ways in Idan to write statements because of all the shortcuts and handy syntactic forms it provides, so we will start with some basic ones and see more later.

Since Idan uses a high-level abstraction mechanism to support human-readable resource names, we will start with basic tools which don’t yet make much sense alone, and gradually build our way to the fun parts. Since unreadable examples aren’t very useful, they will use readable names, and the features behind then will eventually be explained in later sections.

Note that Idan can be fully localized to any language. It is possible to write an entire Idan file without a single English letter, except for the language chooser at the top of the file (more on that later), which is a two-letter language code. In this tutorial we will mainly use English, but the sections about localization will give examples and explain how to use other languages. Unlike in software translation systems and other mechanisms, English doesn’t have a special status in Idan, and is equal to any other language. The common language in Idan is Smaoin and the statements themselves. For example, you can translate a file from German to French, without going through English.

Each statement has 4 components:

Statement identifiers aren’t interesting most of the time, and Idan allows to omit them from statements. Until we see examples that require them, we will use just the other 3 components: subject, predicate, object.

Each occurence of an entity in the file is called a literal. As a general but hopefully simple example, in the text 5 + 5 + 5 appears only one number - the number five - but there are 3 occurences of it, i.e. 3 literals.

uids

Since the first two components must be uids, let’s see how to write them. A uid is a string of characters with no whitespace. It is written with angle brackets surrounding it. For example:

<b3742023-97ef-4fb0-9dd2-4582d946d6f1>

This uid is very far from human-readable, of course. But don’t worry, you don’t need to work this ugly strings. Either you use a program that generates them, or you write placeholders and let them be filled automatically, or you use more readable ones (although this is discouraged, because in the Smaoin philosophy uids shouldn’t have any meaning). We will see shortly how easy it is to use placeholders.

There are several special “uids” that are reserved for use by Idan, and can’t be used as regular uids. This isn’t a problem because they’re very short and you wouldn’t want to use them as uids anyway. These special uids are called placeholders. We will now meet 3 of them, whose role is to avoid writing ugly long uid strings.

Before we start, let’s make a set of statement on which we can show examples. Don’t worry about the details - writing statements will be explained in detail later. We’ll use the following statements:

<f78548e7-6bff-4202-bb75-614c7eb71ae2> nli:belongs_to_namespace @myns
<f78548e7-6bff-4202-bb75-614c7eb71ae2> smaoin:is_a              smaoin:Class
<f78548e7-6bff-4202-bb75-614c7eb71ae2> smaoin:is_subclass_of    smaoin:Resource
<7c7ad0fa-c583-47f9-b5e3-8d5e1527bd11> nli:belongs_to_namespace @myns
<7c7ad0fa-c583-47f9-b5e3-8d5e1527bd11> smaoin:is_a              smaoin:Class
<7c7ad0fa-c583-47f9-b5e3-8d5e1527bd11> smaoin:is_subclass_of    myns:Person
<90301025-61cf-4156-bb09-3a7cd2203625> nli:belongs_to_namespace @myns
<90301025-61cf-4156-bb09-3a7cd2203625> smaoin:is_a              smaoin:Class
<90301025-61cf-4156-bb09-3a7cd2203625> smaoin:is_subclass_of    myns:Animal

We’ll understand this better later, but for now, these statements basically say the following:

  1. There is a subclass of Resource class, which belongs to the myns namespace. All the classes we define contain resources, so basically we’re just declaring a class here.
  2. There is another class in myns which is a subclass of Person. This may be Male or Female or Child or Adult or something else. It doesn’t say.
  3. There is another class in myns which is a subclass of Animal. For example, maybe it’s Mammal. Here too, it doesn’t say.

Later we will develop this example to be more useful. We could do it now, but then it would become too long. Later, we will add new things and drop some of the older ones, to focus on the new features and keep examples small.

Regardless of the meaning of this, do you notice something exceptionally ugly here? Two things are especially annoying:

  1. Very long ugly unreadable uids
  2. Repetition of the same uid several times

Each row here contains three parts: subject, predicate, object. Therefore the ugly part is in the subjects. Ugly uids could also be in the predicate and object parts, but for now let’s ignore those cases and work with the example we have.

It’s time to meet some placeholders.

The uid Generator (<%>)

The uid generator is a placeholder that means “replace me with a real uid”. It looks like this:

<%>

Placeholders aren’t actually information, and are meant to be replaced by software with real uids. For each occurence of this placeholder in an Idan file, the processing software generates a new unique uid. It means that if we want to refer to the same resource multiple times, we can’t use <%> in all the references, because different uids will be generated instead of using the same one.

For example, if we write a file like this:

<%> nli:belongs_to_namespace @myns
<%> smaoin:is_a              smaoin:Class
<%> smaoin:is_subclass_of    smaoin:Resource

After processing placeholders, it may look like this:

<f78548e7-6bff-4202-bb75-614c7eb71ae2> nli:belongs_to_namespace @myns
<7c7ad0fa-c583-47f9-b5e3-8d5e1527bd11> smaoin:is_a              smaoin:Class
<90301025-61cf-4156-bb09-3a7cd2203625> smaoin:is_subclass_of    smaoin:Resource

Each <%> got its own unique uid.

Now, let’s try to use the uid generator to make the original example with the 9 statements look better. How about this:

<%>                                    nli:belongs_to_namespace @myns
<f78548e7-6bff-4202-bb75-614c7eb71ae2> smaoin:is_a              smaoin:Class
<f78548e7-6bff-4202-bb75-614c7eb71ae2> smaoin:is_subclass_of    smaoin:Resource
<%>                                    nli:belongs_to_namespace @myns
<7c7ad0fa-c583-47f9-b5e3-8d5e1527bd11> smaoin:is_a              smaoin:Class
<7c7ad0fa-c583-47f9-b5e3-8d5e1527bd11> smaoin:is_subclass_of    myns:Person
<%>                                    nli:belongs_to_namespace @myns
<90301025-61cf-4156-bb09-3a7cd2203625> smaoin:is_a              smaoin:Class
<90301025-61cf-4156-bb09-3a7cd2203625> smaoin:is_subclass_of    myns:Animal

This is wrong, because the generated uids won’t be the same as the ones we do specify here directly. But we can’t use <%> 9 times here, because it will generate 9 different uids. If we had a way to refer to a generated uid, i.e. use it multiple times, it could solve the problem. The next placeholder is made exactly for that.

The Up Arrow ($^)

This placeholder looks like this:

$^

Does it look close enough to an arrow pointing up? Hopefully it does, because that’s exactly what it means. $^ says “copy the uid from the previous statement”. Using the up arrow, we can write multiple statements which use the same subject, without writing ugly uids. Now our example can look much better: (blank lines added just for readability)

<%> nli:belongs_to_namespace @myns
$^  smaoin:is_a              smaoin:Class
$^  smaoin:is_subclass_of    smaoin:Resource

<%> nli:belongs_to_namespace @myns
$^  smaoin:is_a              smaoin:Class
$^  smaoin:is_subclass_of    myns:Person

<%> nli:belongs_to_namespace @myns
$^  smaoin:is_a              smaoin:Class
$^  smaoin:is_subclass_of    myns:Animal

The Down Arrow ($,)

Similar to the up arrow, but in the other direction, is down arrow reference. It looks like this:

$,

I know it doesn’t look like an arrow pointing down, but that’s what it means. “Copy the uid from the next statement”.

It’s probably less useful than <%> and $^, but can be used in situations a block of statements is already written and we want to add something before it:

$,  myns:plays    myns:piano
<%> smaoin:is_a   myns:Person
$^  myns:has_name "John Doe"
$^  myns:has_age  34

References

As you probably noticed already, we can use readable name when referring to resources. In the examples above, all the predicates we used are human-friendly. They are composed of two parts separated by a colon (:). How does it work? We said predicates must be uids. So in some way, these friendly names actually refer to some ugly uids behind the scenes.

We will answer the “how” question later, because it’s not essential right now and requires us to study Idan’s localization system. For now, we will just see how to use these things.

These x:y notations are called namespaces. They allow us to put several related resources under one title. This is similar to modules and namespaces in some programming languages (but not the same, as we’ll see better later). Once a resource has been given a namespace and a label, it can be referred to using them.

For example, suppose we have a class Person in the myns namespace. Then we can, as already seen above, refer to it as myns:Person.

Values

The object part of a statement may contain a resource, but it may also contain a value. Each value type in Smaoin has convenient syntax in Idan. We will fully examine all the small details later, and focus right now on the important part: how to write simple statements with values.

Since uids are enclosed in <>, it’s easy to distinguish them from values. Values aren’t enclosed by <>, but can be enclosed by other characters - depending on the type and the specific syntax used.

Booleans

Boolean values are simple: there are exactly two of them. true and false. You just write them as-is as the statement’s object. Example:

<%> myns:has_name       "John Doe"
$^  myns:uses_gnu_linux true
$^  myns:uses_losedows  false

Characters

Any Unicode character is a valid Character value, but not all Unicode characters can written as-is directly in the Idan document. There are several ways to write Character literals, which together cover the whole Unicode and provide some flexibility.

One way to write character literals is by enclosing them between single quotes ('). This is similar to many other computer languages. For example, 'A' is the uppercase A character value, and ' ' is the space character. Less usual characters can be written as-is between quotes too, but:

Therefore, in many cases it is preferred or even required to use escape sequences. These sequences allow to refer to characters using a notation composed of common ASCII characters. A sequence always begins with the backslash character (\). There are two types of sequences: symbolic and numeric. The symbolic ones which apply to '-delimited character literals are:

Sequence Unicode Description
-------- ------- -----------
\a       U+0007  audible bell
\b       U+0008  backspace
\f       U+000C  form feed - new page
\n       U+000A  line feed - new line
\r       U+000D  carriage return
\t       U+0009  horizontal tab
\v       U+000B  vertical tab
\'       U+0027  single quote
\\       U+005C  backslash

The numeric ones allow specifying the character number of a Unicode character:

The maximal value of a numeric sequence is \1114111, which is the same as \o4177777 and \x10ffff. This is exactly enough to represent all Unicode values.

Many visible UTF-8 characters can be specified as-is between the quotes, but not all of them. The characters allowed are all Unicode characters which have basic type Graphical (this includes the ASCII letters, digits and symbols, and much more) except for ' and \, which must be specified as escape sequences.

It’s probably a good idea to use escape sequences for characters with unusual or confusing visual representation.

There are other forms of character literals, which will be covered later.

Numbers

Integers can be written in decimal, octal, hexadecimal or binary base, with an optional preceding minus sign (-). Leading zeros are valid. By default the base is decimal (what you use regularly in daily life). Examples:

0
12
-35435
9847392843923
-1

To indicate a different base, use a prefix. 0 for octal, 0x for hexadecimal and 0b for binary. In hexadecimal, the digits after 9 are represented by a-f (or A-F). Examples

012
0x78f98ab
0b1101000100101001

There are also rational numbers, which are expressed as fractions. A pair of integers separated by a slash (/). Only the first integer is allowed to have a minus. Examples:

1/2
-2/3
-0x78fdda78b/0b100100011011010

It’s also possible to write numbers with a decimal point. In that case only decimal base is allowed. An optional minus sign, an optional sequence of digits, a decimal point and more digits. Trailing zeros are allowed. Examples:

1.2
.3
234345.0
76867867.245645
-345.465
-0.23423
-.7

Integers in decimal base and numbers with a decimal point can have an exponent part. The pair (number, exponent) means “number * 10 ^ exponent”. The exponent must be an integer in decimal base. A caret (^) separates between the number and the exponent. Examples:

10^3
10^-3
-10^3
-10^-3
343.23423^5
-.7^-8

[[!template id=todo text=“study alternative number systems”]]

TODO is the omit-zero thing like in -.7 really needed? Read about it and see if it’s common anywhere in the world. Omit if not.

Strings

A string is a sequence of characters. The common form for writing strings is between double quotes ("). Any Graphical character is allowed between the double quotes, except for \ and ", which must be escaped. The escape sequences of Characters work here too, but the list of symbolic ones is a bit different:

Sequence Unicode     Description
-------- ------- --------------------
\a       U+0007  audible bell
\b       U+0008  backspace
\f       U+000C  form feed - new page
\n       U+000A  line feed - new line
\r       U+000D  carriage return
\t       U+0009  horizontal tab
\v       U+000B  vertical tab
\"       U+0022  double quote
\\       U+005C  backslash
\&               empty string

Unlike for characters, the single quote (') is allowed as-is in strings and there’s no escape sequence for it. Instead, " must be escaped. There’s also the \& escape. “Empty string” may seem strange. Indeed, "hello" and "he\&llo" are exactly the same string value. Why is&` needed? Answer: it can help separate sequences. For example, look at these:

The first string contains a single character indicated by the numeric sequence. The second string contains a sequence too, and then ‘9’ and then ‘3’. So the \& can break a sequence.

String examples:

"hello world"
"They said \"Hello world\""
"line 1\nline 2\nline 3"

It’s possible to write multi-line strings by having newlines inside them, either as-is or using an escape sequence like \n. Example:

"This is line 1.
This is line 2.
  - Item 1
  - Item 2
  - Item 3
h
 e
  l
   l
    o"

Another way to write long strings is by specifying a sequence of several strings. The result will be their concatenation. For example “a” “b” is the same as “ab”. More examples:

"hello " "world"

"Line 1. "
"Still the same line.\n"
"Line 2."

"h" "e" "l" "l" "l" "\n"
"w" "o" "r" "l" "d" "\n"

"x\n" "y\n" "z\n"

Some strings contain many double quotes, and escaping them as \" all the time is a bit inconvenient. For this reason it’s also possible to enclose strings between triples of double quotes, i.e. """. These strings work exactly like the regular " strings, but with " allowed and the sequence """ not allowed inside the string. It can be written escaped as \""". Example:

"""They said "hello world"."""

TODO update here according to Idan’s rules for strings…

Binary Chunks

A “binary chunk”, or “data”, is a sequence of bits and bytes. An opaque block of binary data. Binary chunks can be used to embed images, compressed files, encrypted files and so on inside an Idan text file. They are somewhat like strings, but can contain anything, not limited to characters.

Binary data is encoded in Idan text using a [[!wikipedia Base64]] scheme. The Base64 string is enclosed by double vertical bals, i.e. ||. The alphabet used is A-Z, a-z, 0-9, + and / in this order, to represent the 6-bit numbers from 0 to 63. Padding with =s is optional. Line length is variable.

The type name of binary chunks is Data.

For example, the Base64 of “hello world” is aGVsbG8gd29ybGQK. We can use it in a statement like this:

&jane myns:knows_secret ||aGVsbG8gd29ybGQK||

The Base64 string may stretch over multiple lines. All spaces and newlines between the opening || and the closing || are ignored when software parses and decodes the Base64 string into binary data. For example, you can write a long chunk like this:

&jane myns:knows_secret ||QW4gSW5kaXZpZHVhbCBoYXMgbm90IHN0YXJ0ZWQgbGl2aW5n
                          IGZ1bGx5IHVudGlsIHRoZXkgY2FuIHJpc2UgYWJvdmUgdGhl
                          IG5hcnJvdyBjb25maW5lcyBvZiBpbmRpdmlkdWFsaXN0aWMg
                          Y29uY2VybnMgdG8gdGhlIGJyb2FkZXIgY29uY2VybnMgb2Yg
                          aHVtYW5pdHkuIEV2ZXJ5IHBlcnNvbiBtdXN0IGRlY2lkZSBh
                          dCBzb21lIHBvaW50LCB3aGV0aGVyIHRoZXkgd2lsbCB3YWxr
                          IGluIGxpZ2h0IG9mIGNyZWF0aXZlIGFsdHJ1aXNtIG9yIGlu
                          IHRoZSBkYXJrbmVzcyBvZiBkZXN0cnVjdGl2ZSBzZWxmaXNo
                          bmVzcy4gLSBNYXJ0aW4gTHV0aGVyIEtpbmcgSnIuCg==||

Comments

Comments are parts of Idan code that are ignored by the parser. They are there only for humans to read. You can use them to write all kinds of notes, TODOs and so on. Note that in a semantic information language like Idan, there is less need for comments than in other kinds of languages, because a lot of information can be expressed using the language itself. But they still are very useful. Don’t hesitate to use them.

There are two kinds of comments. One kind starts somewhere on a line of Idan text, and stretches until the end of the line. The other kind can span over any length: A part of a line, a whole line or even several lines. We’ll refer to these kinds here as “single line comments” (SLCs for short) and “multi line comments” (MLCs for short) although these names are inaccurate.

An SLC can start at the beginning of the line. In this case the entire line is considered a comment. It can also start in the middle of a line, in which case the part before it is regular Idan text, while the rest of the line is a comment. The beginning of an SLC is marked by a double hyphen, i.e. . Examples:

-- This is a commment.
<%> smaoin:is_a smaoin:Property -- This is a another comment.

An MLC begins with -{ and ends with }-. Examples:

-{ This is a comment. }- -{ This is another comment. }-
-{ Comment too. }- <%> smaoin:is_a smaoin:Property -- This is an SLC.
-{ Multi line comment starts here.
    line 2
    line 3 }-
<%> -{ 1 }- smaoin:is_a -{ 2 }- myns:Person
-{ hello world
 - this is a comment line
 - this is still the same comment
 - that's right, still inside the comment
 - one more line for fun...
 - and we're done.
 }-

Statements

There are several ways to write statements. Some of them are easier to handle manually by humans, while others are friendly to scripts and GNU text utilities. We will now see the most basic way, and later examine the other ways.

Triples

The most basic form of statement writing is listing triples. Each line contains a single triple. Each triple has 3 components separated by whitespace. These components are of course the subject, predicate and object of the statement. Example:

-- s |        p          |     o
----- ------------------- ----------
  <%> myns:has_name       "John Doe"
  $^  myns:uses_gnu_linux true
  $^  myns:uses_losedows  false

While writing one statement per line is readable, it is somewhat limited. Maybe we want to write 2-3 short statements on the same line, or we have a very long statement we’d like to break into several lines. This is possible by placing a period (.) at the end of a statement. Doing so allows the statement to be broken into several lines or be followed by another statement on the same line. Let’s see examples.

A period can optionally be appended in the one-per-line style:

<%> myns:has_name "John Doe" .
$^  myns:uses_gnu_linux true .
$^  myns:uses_losedows  false .

After a period more statements can follow:

<%> myns:has_name "John Doe" . $^ myns:uses_gnu_linux true .

Long statements can stretch over several lines:

<%>
myns:has_name
"John Veeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeerylongname Doe" .

Component Lists

Very often, when describing a certain object, there are several consecutive statements that have common parts, and the result looks like unnecessary repetition. It could be nice to refactor these parts, and make them appear just once. It also makes writing more convenient and fast. For example:

<%> myns:has_name       "John Doe"     -- 1
$^  myns:uses_gnu_linux true           -- 2
$^  myns:uses_gnu_hurd  true           -- 3
$^  myns:uses_losedows  false          -- 4
$^  myns:uses_distro    myns:trisquel  -- 5
$^  myns:uses_distro    myns:parabola  -- 6
$^  myns:uses_distro    myns:gnewsense -- 7

We have 3 kinds of repetition here:

  1. All statements have the same subject
  2. Statements 2 and 3 provide the same information, but for different predicates
  3. Statements 5, 6 and 7 provide the same information, but for different objects

There are several ways to avoid the repetition and write concise code. In this section and in the next ones, we’ll meet these ways. The first one, which we will see now, is component lists.

A component list is… a list of statenent components separated by commas and enclosed with parentheses. All the members of the list have the same role in the statement, i.e. they are all subjects or all predicates or all objects. If we have several statements which differ with just one component, we can combine them into a “single” statement which actually contains both, by putting the differing components in a component list.

For example, let’s take these following statements from above:

$^  myns:uses_gnu_linux true           -- 2
$^  myns:uses_gnu_hurd  true           -- 3

Using a component list will result with:

$^  (myns:uses_gnu_linux, myns:uses_gnu_hurd) true

We can also factor out the namespace:

$^  myns:(uses_gnu_linux, uses_gnu_hurd) true

The same works for statements that differ in the object part. Let’s take these:

$^  myns:uses_distro    myns:trisquel  -- 5
$^  myns:uses_distro    myns:parabola  -- 6
$^  myns:uses_distro    myns:gnewsense -- 7

And apply the same technique:

$^  myns:uses_distro myns:(trisquel, parabola, gnewsense)

Component lists are useful mainly for difference in one component, and for generation of statements from multiple lists. For example, assume we have a person John Doe. Joe likes tomatoes, cucumbers and carrots. He also eats tomatoes, cucumbers and carrots. To express this information, we would need 7 triples:

<%> myns:has_name "John Doe"
$^  myns:likes    myns:tomato
$^  myns:likes    myns:cucumber
$^  myns:likes    myns:carrot
$^  myns:eats     myns:tomato
$^  myns:eats     myns:cucumber
$^  myns:eats     myns:carrot

If we use component lists with the objects, we can reduce this to 3 statements:

<%> myns:has_name "John Doe"
$^  myns:likes    myns:(tomato, cucumber, carrot)
$^  myns:eats     myns:(tomato, cucumber, carrot)

And we can even use the up arrow to avoid the object repetition:

<%> myns:has_name "John Doe"
$^  myns:likes    myns:(tomato, cucumber, carrot)
$^  myns:eats     $^

Using component lists, we can reduce this to 2 statements:

<%> myns:has_name      "John Doe"
$^  myns:(likes, eats) myns:(tomato, cucumber, carrot)

When more than one statement component is a list, every item of one list is matched with every item of the other list, and we get 6 different statements from the second line above. It’s also possible to use lists for all 3 components.

Free-Form Blocks

With component lists we were able to handle especially difference in one component between statements, and in special cases also 2 and 3 differing components. We weren’t able to avoid the repetition of the subject, for example. Although all the statements we had above use the same subject: in the first one is written as <%>, and in all the rest as $^. What if we could write it just once, and have all the statements share it?

Perhaps we could enclose both the predicate and the subject in parentheses, and have a multi-list in which each item contains both a predicate and a subject. But that becomes less convenient with a bit longer statements, and it therefore not allowed in Idan’s syntax. Instead, we will use statement blocks to accomplish the same result.

There are two ways to write statement blocks: free-form and indented. We will start with the free-form blocks.

Remember we saw how we can end a statement with a period and then have another statement on the same line, or have a multi-line statement? In the same way we use a period to separate statements, we can also use a semicolon to separate a subject from the predicate and object parts, and we can use a comma to separate the predicate from the object.

For example, let’s go back to these statements:

<%> myns:has_name       "John Doe"     -- 1
$^  myns:uses_gnu_linux true           -- 2
$^  myns:uses_gnu_hurd  true           -- 3
$^  myns:uses_losedows  false          -- 4
$^  myns:uses_distro    myns:trisquel  -- 5
$^  myns:uses_distro    myns:parabola  -- 6
$^  myns:uses_distro    myns:gnewsense -- 7

We can remove the repeated subject and use semicolons to separate the 7 predicate-object pairs. After the last pair, which ends the statement, we place a period instead of a semicolon.

<%> myns:has_name "John Doe" ;
    myns:uses_gnu_linux true ;
    myns:uses_gnu_hurd true ;
    myns:uses_losedows false ;
    myns:uses_distro myns:trisquel ;
    myns:uses_distro myns:parabola ;
    myns:uses_distro myns:gnewsense .

All of this could technically be on the same line, but that would be a very long line of course.

Let’s use commas to avoid predicate repetitions:

<%> myns:has_name "John Doe" ;
    myns:uses_gnu_linux true ;
    myns:uses_gnu_hurd true ;
    myns:uses_losedows false ;
    myns:uses_distro myns:trisquel, myns:parabola, myns:gnewsense .

In the last line (the only change we made) we could use a component list instead, and avoid writing the namespace prefix 3 times.

Free-form blocks allow to remove repetitions and write statements with any layout - one line, multiple lines, arbitrary indentation and so on. But they require that you maintain those punctuation marks. In practice, unless you find some elegance in the punctuation marks, you’ll mostly use the other type of block - indented blocks.

Indented Blocks

Indented blocks are statement blocks in which the punctuation marks are optional. They use indentation to mark the roles of the components, in a similar way to programming languages Haskell and Python. The rules are simple:

  1. When moving from one component to the next (out of the possible 3, since we are still ignoring statement identifiers), increase the indentation level or stay on the same line.
  2. When not moving, keep the same indentation level.
  3. All items on the same level must have the same offset from the beginning of the line. This includes the case of the first item not being at the beginning of the line (i.e. the offest contains non-whitespace).

It doesn’t matter what you use for indentation: spaces, tabs, a combination of spaces and tabs… it just needs to follow the increase/keep rules.

Note that we can still use punctuation marks in conjunction with indentation, which allows us to e.g. write several short parts on the same line, separated by semicolons, inside an indented block, without ruining the indentation.

Going back to the John Doe example:

<%> myns:has_name       "John Doe"     -- 1
$^  myns:uses_gnu_linux true           -- 2
$^  myns:uses_gnu_hurd  true           -- 3
$^  myns:uses_losedows  false          -- 4
$^  myns:uses_distro    myns:trisquel  -- 5
$^  myns:uses_distro    myns:parabola  -- 6
$^  myns:uses_distro    myns:gnewsense -- 7

Here we use 4 spaces for indentation and remove the subject repetition:

<%>
    myns:has_name       "John Doe"
    myns:uses_gnu_linux true
    myns:uses_gnu_hurd  true
    myns:uses_losedows  false
    myns:uses_distro    myns:trisquel
    myns:uses_distro    myns:parabola
    myns:uses_distro    myns:gnewsense

We can use another indentation level to eliminate the predicate repetition in the last 3 statements:

<%>
    myns:has_name       "John Doe"
    myns:uses_gnu_linux true
    myns:uses_gnu_hurd  true
    myns:uses_losedows  false
    myns:uses_distro
        myns:trisquel
        myns:parabola
        myns:gnewsense

Block Subject References

When writing statement blocks, we sometimes write several consecutive blocks that describe related resources. Related means that the predicate or object of one statement is the subject of another statement in our file. We saw already how we can use placeholders to copy one statement’s subject into another statement’s subject (and the same works for predicates and objects), but can we use a previous statement’s subject as a new statement’s object?

One way we already saw is references. We can use namespace:label forms. But these have a weakness: They require that all the resources we want to refer to will have labels and namespaces. For important resources that should be publicly known it’s fine - we want them to have labels anyway - but what if we have a group of resources that are just helpers, and aren’t meant to have a label?

We can assign them labels anyway. That’s the obvious solution. It’s possible and it works. The first problem we might encounter this way is namespace pollution: our namespace gets filled with a very large number of labels, and it’s hard to remember and track them all, and avoid collisions. The solution is simple: define a “dummy” namespace we can fill without polluting the existing ones. This still has the effect of polluting the database, if we later load the idan file we wrote into a database. The labels we assigned were used just for the definitions, so they’ll never be referred to again - why fill the database with them? Suggested solution: Somehow tell the database to drop the dummy labels. This has issues too, but suppose we can handle them.

So suppose the dummy label solution works. How do we choose labels for a large number of objects? Maybe they were even generated by the computer - how can it give them meaningful easy to remember names? Indeed, it can’t, and neither can we. If the objects don’t carry unique important meaning, we won’t remember their names. These are just helpers, not meaningful objects. We will end up with a large set of labels we can’t remember, and every time we need to change something or understand the code, we’ll have to go to the definitions and find the objects, because the labels won’t help us understand what they refer to.

If the collection of objects is small enough, this may be a reasonable solution. Maybe we’ll even find a way to give meaningful labels. And even then, the dummy labels somewhat clutter our Idan file. Fortunately, there are other ways. They don’t always work - but when they do, they’re simple and concise. We will now see one of them.

A block subject is the subject of a statement block. Whether we used the svb placeholder, a free-form block or an indented block, or even a single standalone statement - the block has a single subject.

A block subject reference allows us to refer to the block subject from a block surrounding it. Technically we can do that from any block, but these references are based on the distance between the block and the reference to its subject, which means they’re good especially for short distances, e.g. refer to the subject of the next block or the previous block.

Let’s call these references BSRs for short.

A BSR can be used in place of a subject, predicate and object of a statement. The first character of a BSR is always the hash sign (#), followed by a location indicator. These indicators have two forms. One form is a sequence of comma (,) characters or a sequences of caret (^) characters. They indicate the position of the target block, relative to the one containing the BSR. Examples:

#^^^^ -- the subject of the 4th previous block (4 blocks up)
#^^^  -- the subject of the 3rd previous block (3 blocks up)
#^^   -- the subject of the 2nd previous block (2 blocks up)
#^    -- the subject of the previous block
#,    -- the subject of the next block
#,,   -- the subject of the 2nd next block (2 blocks down)
#,,,  -- the subject of the 3rd next block (3 blocks down)
#,,,, -- the subject of the 4th next block (4 blocks down)

Here’s an example of BSRs in action:

<%>
	smaoin:is_a         myns:Action
	myns:has_name       "Get ingredients"
	myns:is_followed_by #,
<%>
	smaoin:is_a         myns:Action
	myns:has_name       "Make food"
	myns:is_followed_by #,
<%>
	smaoin:is_a         myns:Action
	myns:has_name       "Eat food"
	myns:is_followed_by #^^

The second form is a hash sign, followed by a comma or a caret and then a number. The number indicates the offset of the referenced statement. For example:

This is especially useful for a bit bigger numbers. For example, #^6 may be easier to write and to read than #^^^^^^.

Anchors

So far, we saw only one way to refer to to resources by name: the x:y references. But we haven’t really seen yet how they work or why they exist or where those namespaces and labels are actually defined. What we did examine is ways to refer by position. These are very simple but also very limited.

Resource references that use namespaces and labels are flexible and powerful, but they aren’t really just local name tags for immediate use, like position based references are. Namespaces and labels are a major part of the Smaoin framework, and are somewhat a “heavy hammer”, too heavy for simple needs like the ones for which position-based references exist. We could enjoy both worlds by having a hybrid - name based references that are as simple as the position based ones we saw above.

An anchor is an occurence on an entity in the file, given a label that can be referred to from other places in the file (not to be confused with labels used for namespaces, which are a different thing). It works somewhat like having a heading in an HTML file, e.g. <h1>Intro</h1>, and referring to it as index.html#Intro. If you don’t know what this means don’t worry - let’s see how anchors work in Idan.

Using anchors requires two parts: marking a statement component with a label, and referring to it from other statements. Marking uses the = character (like assignment in programming), and referring uses the & (like references in C/C++). Let’s take these simple statements:

<%> myns:has_name "John Doe"
<%> myns:has_name "Anne Doe"

Now suppose we’d like to express “Anne Loves John” in an additional statement. Since the statements are close (actually adjacent), we can use a positional reference:

<%> myns:has_name "John Doe"
<%> myns:has_name "Anne Doe"
#^  myns:loves    $^^

The #^ could also be $^ (why?). The problem seems to be solved, but what if the statements are not adjacent? Suppose 100 statements separate between them. We could technically use $^100, but it’s unlikely we can conveniently count these 100 statements, and seeing $^^ there doesn’t tell the reader anything about the target of the reference. A named reference would be much nicer. First, let’s mark John and Anne with anchor labels, while still using positional references in the last statement:

<%> =john myns:has_name "John Doe"
<%> =anne myns:has_name "Anne Doe"

-- some 100 statements here

$^101 myns:loves $^102

Using the new anchors john and anne we can use friendly references:

<%> =john myns:has_name "John Doe"
<%> =anne myns:has_name "Anne Doe"

-- some 100 statements here

&anne myns:loves &john
&john myns:loves &anne
&john myns:has_age 34

An anchor label can’t contain whitespace, and can contain letters, digits, hypens (-) and underscores (_).

An anchor label definition always comes immediately after the literal you want to name, on the same line. It can be used with any statement component, not just with the subject:

<%> =john <%> =loves <%> =anne

Note that these labels are meaningful only for us, humans, and help us express information better. They aren’t part of the statement and aren’t inserted to datastores. The namespaces-and-labels system is much more powerful, and does exist in the form of statements (at the cost of a bit of complexity).

[[!template id=todo text=“Allow apecifying numbers in non-european notation, e.g. maybe Arabic numbers etc.?”]]

Statement Identifiers

In Smaoin, the statements themselves have uids too. That’s because a statement is a unit of information like any other, and there are many useful things you can do with statement uids. For example, you can state when a statement was made or who stated it. Statement uids are often called statement identidiers.

As a result of this addition, each statement is actually a quadruple, or in short quad: It has an identifier, a subject, a predicate and an object. Four parts. So far we only used three of these. Let’s add the fourth one.

Since most of the time statement identifiers are written by humans into Idan files, and aren’t used much, they are written after the subject-predicate-object triples. This way the damage to clarity and readability of the code is minimal, and it’s easy to add them after the whole file has been written without them.

Statement identifiers support use of placeholders and references, but unlike the other parts of the statement, each identifier must be unique and can’t be copied from another statement. For example, using $^ as a statement identifier is an error. Using <%> as a statement identifier is the same as not specifying it at all. You can refer to statement identifiers, but you can’t specify them as references. They are either written directly as uids, or are auto-generated (by using <%> or not specifying them, as we did so far).

The syntax is not exactly like for regular uids. A statement identifier begins with a double slash (//) imediately followed by the value or a reference. Examples:

//<b4a0ba9c-7624-4efc-bee0-d08ed4ed8316>
//&sid
//stmt:mystmt
//:mystmt

Statement identifiers support anchors as well. For example, we can have a statement with an identifier like this:

//<b4a0ba9c-7624-4efc-bee0-d08ed4ed8316> =stmt

And refer to it later:

<%> myns:has_name    "John Doe"
$^  myns:stated_that &stmt

It may be very useful to be able to refer to statment identifiers, without specifying the ugly uids directly. One way to do this is by using <%> as the statement identifier. But since this may be a common pattern when writing about e.g. people who made various statements, a shortcut form (syntactic sugar) is provided: the statement identifier can be omitted and then anchor is written immediately after the //. Then the identifier is shorter and nicer. Instead of many lines having things like this:

//<%> =stmt

you get this:

//=stmt

When such anchors are used many times in a file, using named anchors becomes a bit cumbersome. We may prefer to use something similar to the block subject references, only have them point at the statement identifier instead of to the subject. This is indeed possible using statement identifier references. These references are based on statements, not on blocks (i.e. like $^ and unlike #^). The syntax is the same as for $^, but using an exclamation mark (!) as the first character. The meaning is “give me the statement identifier of the next/previous statement”.

!^^^^ -- the statement identifier of the 4th previous statement
!^^^  -- the statement identifier of the 3rd previous statement
!^^   -- the statement identifier of the 2nd previous statement
!^    -- the statement identifier of the previous statement
!,    -- the statement identifier of the next statement
!,,   -- the statememt identifier of the 2nd next statement
!,,,  -- the statement identifier of the 3rd next statement
!,,,, -- the statement identifier of the 4th next statement

Also with numbers:

!^3   -- three statements up
!,7   -- seven statements down

At this point you may wonder why not have the same thing for predicates and for objects too. If it’s useful for identifiers and for subjects, why are the former two different? Indeed, in some not very common cases it may be useful to have such references. For example, having a statement with a certain predicate and then have other statements using that predicate as their subject or as their object. But in these cases an anchor can be used instead. In the future, if it proves to be useful and common enough, syntax forms will be added to support this kind of references.

A statement identifier is placed right after the content of the statement, i.e. right after the last component of the content - the object. If the object is followed by . or ,, the identifier comes before them. In indented blocks, it can be specified like a 4th statement component, i.e. either on the same line or on the next line with indentation. Examples:

<%> myns:has_name "Jane Doe" //=stmt1

<%>
    myns:has_name       "John Doe" //=stmt2
    myns:uses_gnu_linux true       //=stmt3
    myns:uses_gnu_hurd  true       //=stmt4
    myns:uses_losedows  false      //=stmt5
    myns:uses_distro
        myns:trisquel  //=stmt6
        myns:parabola  //=stmt7
        myns:gnewsense //=stmt8

<%>
myns:has_name       "Jack Doe" //=stmt9  ;
myns:uses_gnu_linux true       //=stmt10 ;
myns:uses_gnu_hurd  true       //=stmt11 ;
myns:uses_losedows  false      //=stmt12 ;
myns:has_friend &alice //=stmt13 , &bob //=stmt14 , &cindy //=stmt15 .

[[!template id=todo text=“decide how to reorder things: either by topic (and then remove the”Basics" from chapter names) or by level (and then make sure to move all the non basic things - like statement uids - to a new later chapter, and have a good syntax reference)"]]

TODO consider adding { and } to free-form blocks

TODO consider a template system, and use it to provide sugar for names, descriptions, prefixes and labels

Nested Blocks

Very often we organize information in a hierarchical structure. For example, an XML file is essentially a tree of tags. A long document is a list of chapters, and each chapter is a list of sections, and each section is a list of paragraphs and so on. Having the hierarchy available visually in the form of a Table of Contents - on in XML the file itself - makes it much easier to work with the information, than to have a flat list with items referring to each other. That could quickly become a mess with many anchors, make references, many labels and overall chaos.

So far, we haven’t seen much hierarchy in Idan. Even with statement blocks, it’s still largely a flat list of statements. If we wanted to describe a book with chapters with sections with paragraphs, we’d have to do linearly, losing the nice hierarchy we could have in XML for example. In section we’re meeting a new friend which brings the hierarchy to Idan: nested blocks.

A nested block is a statement block which appears right where its subject is referred to. This way a lot of the spaghetti of anchors and references can be avoided: instead of defining the resource separately and making a reference to it, define it in-place, right where you’d otherwise place the reference. Nested blocks support anchors too, which means you can have references to a nested block. This way you enjoy having less clutter, and still can have an arbitrary graph that isn’t necessarily a perfect hierarchical tree structure.

Note that nested blocks are just a syntax form and have no additional semantic value. They are just a way to express a sequence of statements in a more readable way for the human user.

A nested block can be written anywhere a statement object can be written. In fact, it technically comes after the object of a statement (but before the statement identifier, if one is specified), but if the object is <%> it may be omitted. Nested block are enclosed in square brackets ([]) and contain a sequence of predicate-object pairs (and extras like statement identifiers, anchors etc.). For example:

<%>
    smaoin:is_a myns:Person
    myns:has_name
    [
        myns:first "John"
        myns:last  "Doe"
    ]
    myns:has_age 34
    myns:has_height 170

In this case the object uid is omitted. It can be specified - either as a uid or as a reference of some kind. Example:

<%>
    myns:has_name <%>
    [
        myns:first "John"
        myns:last  "Doe"
    ]

Specifying <%> is the same as omitting the object.

There are two ways to specify an anchor (e.g. =name) when using a nested block. One way is right after the object, if specified. The other is right after the opening bracket, i.e. the [ of the nested block. If the object is omitted, the second way is the only way. Examples:

<%>
    myns:has_name <%> =name
    [
        myns:first "John"
        myns:last  "Doe"
    ]

<%>
    myns:has_name <%>
    [ =name
        myns:first "John"
        myns:last  "Doe"
    ]

<%>
    myns:has_name
    [ =name
        myns:first "John"
        myns:last  "Doe"
    ]

The nested block content can written either as an indented block (as above) or as a free-form block. Examples:

<%>
    myns:has_name
    [
        myns:first "John" ;
        myns:last "Doe" .
    ]

<%>
    myns:has_name
    [
        myns:first "John" ; myns:last "Doe" .
    ]

<%>
    myns:has_name
    [ myns:first "John" ; myns:last "Doe" . ]

<%>
    myns:has_name [ myns:first "John" ; myns:last "Doe" . ]

<%> myns:has_name [ myns:first "John" ; myns:last "Doe" . ]

TODO go over the (a, b, c) and a:(b, c, d) syntax etc. - doesn’t it partially overlap with other things, such as separating components with , and ;? Maybe there’s a way to do it without overlap? Overlap may cause unnecessary complexity, reduce readability, add surprise, cause confusion through “many ways to do the same thing”…

Indentation

When mixing indented and free-form blocks, especially when nested blocks are involved, it may become unclear exactly what the indentation rules are. So let’s see exactly how it works.

When using only free-form blocks, the file is “one-dimensional”. It can be written with any indentation, or without any, or even be a single very long line. In order to still have a structure of blocks and nesting, explicit marks are used.

A block begins with a subject. Then comes a sequence of predicate-objects pairs, separated by semicolons, and then a period to mark the end of the block. There may be more than one object in the pair, in which case the objects are separated with commas. Instead of these marks, indentation levels can be used.

With indented blocks, the file is “two-dimensional” - content stretches along the file from top to bottom, while indentation indicates structure as text stretches along the lines themselves from left to right (or right to left). Predicate-objects pairs are now separated by having the same indentation level, but not the same level as the subject and not the same as the objects. Therefore these are wrong:

[[!color foreground=#ff0000 text="""
subject
predicate
object
subject
predicate
    object
subject
    predicate
    object

"""]]

In the same way, all objects under the same predicate have the same indentation level, but it must not be the level of the predicate they belong to:

subject
    predicate
        object
        object
        object
    predicate
        object
        object
        object
    predicate
        object
        object
        object

Nested blocks can’t be indicated with indentation. They are indicated using [ and ], both with indented and free-form blocks.

Framework Basics

Intro

We’re now experts at writing statements and statement blocks! But that’s clearly not enough for writing whole files. One thing that’s still missing is: how do we know which predicates, classes and objects to use? How do we connect our new codw with existing code?

Smaoin provides a framework for this purpose (and for other reasons). This framework contains the various building blocks of Smaoin:

Smaoin’s framework isn’t enough: It provides the pieces required for expressing information, but it doesn’t provide tools for humans to write and read it conveniently. For humans to be able to work with Smaoin, a powerful internationalization (i18n) and localization (l10n) mechanism is provided and fully supported by Idan, which makes writing information with Smaoin easy and fun.

These extra mechanisms will be covered later, but we are already using them in our examples (the namespaces are part of this too). In this chapter we will go over what Smaoin provides, the core tools needed for making useful definitions. Those extra mechanisms are what makes the useful definitions also readable and writable by humans.

Classes

Intro

All the sets in Smaoin - resources, values, numbers, properties and so on - are available as classes. When writing Idan files, one of the things we may want to define are classes. We can define them stand-alone, ignoring the other classes in the world, but usually (or even always?) we will prefer to connect them to the global class hierarchy.

In Smaoin, classes can be subclasses of other classes, as an analogy to sets being subsets of other sets. Using the subclass property allows us to define our own hierarchies and connect them to the global one. At the root of the hierarchy sits the smaoin:Entity class. All the other classes are subclasses of it. Semantic databases and processing tools with minimal support for Smaoin semantics should assume that even if you don’t explicity state that your classes are subclasses of Entity.

Note that a class can have more than one parent (i.e. superclass) and more than one child (i.e. subclass). We call it “hierarchy” but it’s really a general graph, not a tree.

Hierarchy Levels 1 and 2

The top of the hierarchy looks like this:

             .----------------------------------------.
             |                 Entity                 |
             '----------------------------------------'
               ^            ^            ^          ^
               |            |            |          |
          .--------.        |            |     .--------.
          | Value  |  .----------.  .--------. | Object |
          '--------'  | Resource |  |  Set   | '--------'
                      '----------'  '--------'        ^
                         ^  ^            ^            |
                         |  |            |            |
                         |  '-------.----'            |
                         |          |             .----------.
                         '----------|-------------| Property |
                                    |             '..........'
                                   /|\
                        .---------' | '--------.
                        |           |          |
                   .--------.  .--------.  .--------.
                   | Group  |  | Class  |  | Type   |
                   '--------'  '--------'  '--------'

Smaoin provides more classes, which we’ll see soon too.

Let’s go over the diagram and understand all the arrows. At the top, there’s the Entity class. Everything in Smaoin is an entity, therefore all classes contain entities and all classes are directly or indirectly subclasses of Entity. Unless you’re working on Smaoin itself, you probably won’t need to use this class directly.

We have 2 ways to categorize the entities:

  1. Whether they are predefined values or resources we define or describe
  2. Whether they represent collections of things, or they are individual things

For the first criterion, we have classes Value and Resource. For the second criterion we have classes Set and Object. The arrows represent “subclass” relations: every value, every resource, every set and every object is an entity. It is similar to the idea of inheritance in object-oriented programming languages.

Hierarchy Level 3

Next, we have 4 more classes.

Group is the class of all the groups. A group is a collection of items that doesn’t have any meaning externally, and is used just for the purpose of expressing other information using it. For example, while the set “People” has meaning on its own, the set “Alice, Bob and Cindy” doesn’t mean anything - it’s just a list of three objects. Only if say something else about it, such as “Alice, Bob and Cindy are friends”, then the group has a reason to exist, it has meaning. Later we’ll see in which cases groups are useful.

Since groups are both sets (meant to contain things) and resources (no values are groups), Group is a subclass of Set and Resource.

Property is the class of all properties. A property represents a relation between entities, and can be used as the predicate part of a statement. Since properties are resources (no property is a value) and objects (they don’t contain anything), Property is a subclass of Object and Resource.

Type is the class of all data types in Smaoin. For example, Boolean and Character are data types. These types are resources, but their members are values, e.g. characters and boolean values, and values aren’t sets. Therefore Type is a subclass of Set and Resource.

Class is the class of all classes. A class is a certain trait, which a resource either possess (i.e. member of the class) or doesn’t possess (i.e. not member of the class; if not specified, membership status is unknown). All classes are resources and are in a way containers of other things, therefore Class is a subclass of Set and Resource.

As you can see, all 4 classes are subclasses of Resource. Then, Property is also a subclass of Object, while the other 3 are subclasses of Set.

These details may be confusing at first. Don’t worry; you’ll understand them better when you start using them and see examples. Even I get confused by them sometimes! One good way to clarify things to yourself is to draw the diagram above (or a part of it) and add some objects to it which will serve as examples. Smaoin’s diagrams can help too.

Usage

Classes are everywhere. Using them is quite simple:

When Defining Objects

When defining/describing an object (something that isn’t a set), e.g. a painting, we normally want to state what it is. Or in Smaoin speak, to which classes it belongs. Sometimes it’s just one, sometimes more. For a painting, we’ll state it’s a member of the Painting class, which is hopefully already defined (otherwise we’ll create an ontology for paintings, more on that later).

<%> smaoin:is_a pnt:Painting

Maybe there are more relevant classes:

<%> smaoin:is_a pnt:Painting, shp:RectangleShapedThing .

When Defining Classes

An obvious thing to state about a class is, that is it a class. Or in Smaoin speak, it is a member of the Class class:

<%> smaoin:is_a smaoin:Class

We also want to connect our class to the hierarchy, by specifying its superclasses. If the program that processes our file supports at least subclass inference, merely stating that our class has a superclass or a subclass is enough to express the fact that it’s a class (since only classes can have superclasses and subclasses; nothing else can). But, since inference …

… it is a good habit to state our class is_a smaoin:Class. Not doing so results with this fact being implied, not easily visible in the code, and if we make changes, it might no longer be implied and we don’t even notice. Stating it makes it clear and visible.

Now, let’s specify superclasses too. Assume the class we are defining is Boy, and can have superclasses like these:

<%>
    smaoin:is_a           smaoin:Class
    smaoin:is_subclass_of ppl:(Child, Male)

Types

Each value has a data type. Types are schemes of representation of data in a computer’s memory. For example, there is a standard way to encode numbers using the memory’s bits and bytes. There are also ways to encode characters, booleans and so on.

The theory behind Smaoin types is not critial for writing in Idan, and will be covered in a later chapter.

The types available in Smaoin (and therefore in Idan), as we’ve seen already, are Boolean, Character, Number, String and Data. All of them are in fact classes, whose members are values. Therefore each type class is a member Type (because it is a type) and a subclass of Value and Object.

It is possible to define new types, based on these. For example, we may wish to encode a month day as a number between 1 and 31. 32 wouldn’t be a valid day and it could be nice if the computer detected it and refused to have a 32th day. Defining types will be explained later.

The main use of type classes in your code will probably be defining the domain and range of properties. These are explained in the next section.

Properties

Intro

Properties are the only things that should be used as predicates in statements. Therefore, you use them all the time: in every single statement you write. But we can also use them as subjects and objects of statements, which allows us to define and describe them.

Smaoin provides properties for defining properties. In particular, some properties are described using themselves. For example, the domain and range properties themselves have a domain and a range. Properties also have their own properties for expressing inheritance. We’ll see below what it means and how it works.

Names

Each property has two names: a field name and a predicate name. In linguistics there are probably official names and definitions of these things, but we’ll just explain them in computer terms.

Field Name

A field name is like the names you give fields (i.e. data members) of classes in object-oriented programming languages, and to table columns in relational databases. For example, we can define a person structure in [[/languages/C]] like this:

[[!format c """ struct Person { double height; double weight; char first_name [20]; char last_name [20]; Gender gender; Person *parents [2]; }; """]]

Height, weight, name and so on are traits of a person. Every person has them. In other words, they are a person’s properties. The following statement in C sets John’s weight to 70 (suppose it means 70 kg):

[[!format cpp """ john.weight = 70.0; """]]

In a similar manner we can make a statement about John’s weight in Idan:

<%> myns:name   "John"
$^  myns:weight 70.0

While this is perhaps more concise and even minimal, it looks less close to a sentence in a human language, compared to what we’ve seen so far. That’s because we haven’t used field names until now. We’ve been using the other kind of name, presented below, and most of the examples in this tutorial use it too.

Field names may be good for rapid prototyping and technical works, but they are probably not very good for making semantic desktop and Rel4tion easy and accessible to a wider public. If we don’t mind a bit more verbosity, we can achieve friendlier-looking code.

Predicate Name

A predicate name is what you use when talking to people or writing in human language. To be precise, it corresponds to several specifics forms used in human language. For example, “John is 34 years old” doesn’t correspond to any form in Idan, but “John is a young man” does.

Since there is a variety of such forms in human language, and not only in English, there is no easy intuitive way to guess the predicate name of a given property. In order to maintain a uniform consistent style, which makes things easier and less surprising, there is a set of conventions for choosing the predicate name of a new property. These conventions will be discussed in later chapters.

Using the previous example and our imaginary myns namespace, we may write the same statement using a predicate name, as follows:

<%> myns:name   "John"
$^  myns:weighs 70.0

Do you notice the tiny difference? We changed just one letter! But now, instead of the technical convention of programming languages, we have something that more-or-less resembles a sentence: “John weighs 70 kg”. Here is another example, comparing the two types of names:

-- field names
<%> myns:name            "John"
$^  myns:type            myns:Person
$^  myns:spoken_language myns:English

-- predicate names
<%> myns:has_name "John"
$^  myns:is_a     myns:Person
$^  myns:speaks   myns:English

Both styles are valid and can be mixed freely. Remember the x:y forms are just references: properties simply have a two (or more): a field-style label and a predicate-style label. But they refer to the same uid.

Note that a property isn’t required to have these two names. It may have more than two, or just one, or none at all. But the usual and recommended usage is to start with two names - one for each style - and add more if needed. Sometimes in a human language there is more than one way to express something - adding more names to use extra forms may be a good idea (but 10 names is probably way too much). Also, perhaps some human languages will tend to have more names than others, or the exact meaning of “field” and “predicate” as linguistic forms will vary between languages. This is fine, since it’s possible to define separate names for different language independently: It’s possible that some property has 2 names in English (the common case) but 4 names in French.

Domain and Range

The following statement probably doesn’t make sense …

myns:chair myns:eats myns:door

… because chairs don’t eat doors. But if someone doesn’t exactly understand the meaning of “eats”, or just made a typo, or was confused for a moment, they perhaps could have written a statement that doesn’t make much sense (the statement above is an extreme example of course). If we stated in our code which kinds of things can be to the left of myns:eats (i.e. subjects) and which things can be to its right (i.e. objects), it would make it clear, help avoid confusion, serve as a reminder and perhaps even allow software to automatically detect certain errors and ask us to fix them.

Doing so can also serve as an independent expression of information. For example, if “X eats apples” then X is an animal or a person. It cannot be a chair or a door. And if “John eats Y”, then Y is a type of food. It cannot be a chair or a door (perhaps unless John is a dinosaur). Therefore, just like stating “A is a subclass of B” implies that A and B are classes, stating something using “eats” implies something about the subject and the predicate.

The domain of a property supplies information about the things that can be the subject when the property is used. The range supplies information about the things that can be the object when the property is used. More precisely, they state the class of these things. For example, if the domain of is_subclass_of is Class, we can deduce that in any statement of the form X is_subclass_of Y, the subject X is a class, i.e. member of the Class class. It doesn’t mean X can’t be a member of some subclass. For example, maybe Class has a subclass SpecialClass, and X is a member of SpecialClass. But even without this information, a statement that uses “subclass” tells us that X is a class (but it doesn’t say X is a “special” class).

The properties smaoin:has_domain and smaoin:has_range allow us to state the domain and the range of a property.

Hierarchy

In a similar manner to classes, properties have a hierarchy too. To see why it’s needed, let’s use an example:

<%> myns:is_a            myns:ArithmeticOperator
$^  myns:operates_on     myns:Number
$^  myns:has_global_name "+"
$^  myns:has_local_name  "plus"

These statements describe the mathematical addition operation between numbers, i.e. ‘+’. The last 2 statements have something in common: both of them define names that ‘+’ has. If we wanted to make a list of all the names of ‘+’, we’d have to look for two properties, because we want to capture both global and local names. But are these two kinds the only kinds of names that ‘+’ has?

If we asked the computer to gives us “all global and local names of ‘+’”, hoping to get them all, we may be surprised: If someone, we or someone else, added a name of a third kind, neither global nor local, our query would be incorrent because it would miss that name. What we really want to tell the computer is “give me all the names of ‘+’ regardless of their kind”. We’d like to somehow state that “global name” and “local name” both mean “name”.

That’s exactly where the property hierarchy helps us. The solution is simple: define a property has_name and make has_global_name and has_local_name be subproperties of it. Then using one of them implies has_name. For example:

-- This statement...
myns:plus myns:has_global_name "+"

--...implies this statement automatically, without us stating it directly.
-- myns:plus myns:has_name "+"

In other words, this:

myns:plus myns:has_global_name "+"

is the same as this:

myns:plus myns:has_global_name "+"
myns:plus myns_has_name        "+"

but the first form is clearly shorter and not repetitive like the second form.

Smaoin provides the property smaoin:is_subproperty_of, which we can use to express “property containment” between propeties.

We can define the “subproperty” property as follows:

-- if it is stated that

q is_subproperty_of p

-- and

x q y

-- then it is also true that

x p y

p is a “special case” of q. “global name” is a special case of “name”. Every global name is a name (but not every name is global).

Note that property containment is less common that class containment. There is no “top hierarchy” here, although property containment is used in the framework. Of course it’s still required to know how to use it.

There is another point to discuss: how property containment is related to the domain and range of properties. Example: Suppose the range of has_name is a class myns:Name. In a statement that uses has_global_name, can the object be something that isn’t a myns:Name? No, because names of any kind must be myns:Name, and it is true for has_global_name as well as any other subproperty of has_name.

In other words: if q is a subproperty of p and p has domain C, then q’s domain is either C or a subclass of C. The same is true for the range.

For example, the domain of has_global_name could be a class myns:GlobalName, which would be a subclass of myns:Name. Therefore, technically, if the domain and/or the range of q are the same as the one/s of p, we don’t have to specify them for q because they are implied. Whether we should do it anyway (e.g. for clarity) is a different question, which we will answer below.

Usage

Using existing properties is easy: just put them in the predicate part of a statement. Let’s see how to define new ones.

First of all, every property is a member of the smaoin:Property class. So that’s the first thing to do. Then, if relevant, we can state the domain and the range of the property. Most of the properties you’ll define will probably have specific range and domain you’ll want to specify. Then, if relevant, we can state the superproperties of our new property, i.e. which properties it is a subproperty of.

Example:

<%>
	myns:has_name            "has_global_name"
	smaoin:is_a              smaoin:Property
	smaoin:has_domain        smaoin:Resource
	smaoin:has_range         myns:Name
	smaoin:is_subproperty_of myns:has_name

Just in case I didn’t make it clear, myns is just an imaginary namespace used in this tutorial in examples. When we learn about the localization system, we’ll see how to give human-friendly names to things. After all, all the nice names in the statements above must some from somewhere. You won’t see myns in “real” files, but I suppose it can be used like “foo” and “bar” are used in software related examples.

There are two important things to note regarding the use of domain and range.

First, in the example above the domain is Resource. Can a statement even have a subject that isn’t a resource? No, we already saw that subjects (and predicates) must be resources. Then why state that in the code? Is it useful? Does it mean anything?

That statement indeed doesn’t provide the computer any new information. It knows every subject is a resource, and will refuse to accept anything else. But it is useful to us, people, for several reasons:

  1. It makes sure we never forget to define the domain/range. In the future we may choose to change aproperties domain/range, and it may even affect other properties and create collisions and invalid definitions. By having domain and range lines we make them clearly visible, hard to miss, hard to forget to update if needed in the future.
  2. It tells us what the domain and range are. For example, if we look at the definition of a property which has several superproperties, but no domain and range statements, how can we tell what its domain and range are? We’d have to go over the definitions of all the superproperties and figure it out by ourselves. If we’re unlucky, they have superproperties too… if we explicitly write the domain and the range, we save all the trouble.
  3. We know they weren’t forgotten. If they’re missing from the file, maybe the author simply forgot to add them. How can we identify such mistakes? By always stating domain and range. When we don’t see them, it’s a sign we should either add them or contact the author. If there’s some good reason to omit them, add a comment to make it clear they are omitted intentionally.

The second thing to note is that the range, myns:Name, is the same as the range we gave the has_name property. Therefore, stating it doesn’t add new information. But the points above explain why it’s there anyway.

Property Classes

[[!template id=todo text="Talk about inverse properties, functional, transitive, etc.]]

Localization

Intro

It’s time to discover the system that makes Idan (and Smaoin in general) work with friendly [[!wikipedia “natural language”]] labels and names, learn how to write Idan files in non-English languages and translate from one language to another. We’ll also see how to write Idan code that supports and makes room for usage of these features.

What is localization?

Localization means to adapt the system’s presentation and behavior to the preferences and conventions specific to the user’s local cirtumstances: language, text direction, number writing convention, geographic location, measurement unit system, perhaps also hardware related standards and more.

Every computer language is in fact a form of localization: the computer allows you to express ideas in a human-friendly language, instead of bits and bytes (which are the only thing the computer understands). It adapts itself to what’s natural, familiar and easy to you. As we saw already, Idan does so as well. You can refer to resources using convenient names.

There is another level of localization though, which not all systems supply. While graphical user interfaces (GUIs) are usually translated into the user’s local language, programming languages and the terminal remain mostly English-only. There are technical reason that can justify that, but they aren’t very relevant for Idan, because writing basic Idan files isn’t considered a complicated task like programming is, and not knowing English shouldn’t be a barrier. There are several other points related to Smaoin’s philosophy, but the bottom line is that Idan files can be written using any human language (that computers support), not just English.

We’ll now see what can be localized in Idan, and how.

Components

As we saw above, resources can be referred to using namespaces and labels. We met at least two namespaces: smaoin, and an imaginary namespace called myns I made up just for this tutorial. smaoin is a real namespace provided by the Smaoin framework, but it’s not the only one.

Smaoin provides 4 namespaces: smaoin, ns, lang and nli. The first one contains descriptions of all the core Smaoin elements, which we met earlier. The other 3 namespaces contain elements related to the localization system. None of these 3 is required for Smaoin to work, but without them it’s impractical for humans to work with Smaoin. Therefore they are provided and Idan has built-in support for them in its syntax (the x:y notation is part of this support).

Before we see what these namespaces contain, what their names mean and learn what a namespace is exactly, let’s go over the localiation system itself. Its purpose is to allow people to conveniently write in their preferred human language, and has 4 components:

  1. Label and reference system
  2. Language and translation mechanism
  3. Information language keyword localization
  4. Resource name and description strings (i.e. built-in documentation)

Together, these components allow you to localize everything you write in Idan. It’s even possible to write in right-to-left languages, since even Idan’s keywords (which we haven’t met yet) can be localized and Idan files are encoded in UTF-8, which means you can insert direction control characters where needed.

Reference System

Concepts

The reference system lets you organize definitions inside namespaces, give them labels, and then refer to them using these namespaces and labels.

A namespace is simply a collection of names, with each name being assigned a resource uid. It’s also possible for a namespace to contain another namespace, thus creating a hierarchy of namespaces. Usually using sub-namespaces isn’t necessary adds just adds complexity, therefore most of the time you’ll see and use the simple case of a namespace that contains names of resources.

A name that is a member of a namespace is called a label. When you define a resource, you can assign it a label and a namespace, and then immediately - even inside the same file - refer to it using this assignment. A reference may begin optionally with a colon (:) and then comes a sequence of zero or more namespace prefixes, separated by colons. After the sequence comes another colon, and then a label. The prefix of a namespace is a short string used to represent it in references.

Examples:

:res
myns:res
:myns:res
myns1:myns2:myns3:myns4:res
:myns1:myns2:myns3:myns4:res

In the last example, myns1 contains myns2, which contains myns3, which contains myns4, which contains the label res.

It’s possible to omit the first namespace in the sequence, in which case the reference begins with a colon. Suppose the last example should basically start with myns0. We’ll see below when the first namespace can be omitted.

Access to Namespaces

When an Idan file is submitted to a datastore or passes some sort of reference resolution (part of which may be processing the placeholders we met before), the definition of the contents of namespaces used in the file need to be available. Before that can happen, the software needs to know the uids of the namespaces, since the names used in refernces are just the prefixes of the namespaces.

A namespace is indeed - like most things in Smaoin - a resource like any other. When the processing software finds a reference in the file, it needs to match the prefix to the namespace resource uid to which it belongs. Note that namespace prefixes aren’t necessarily globally unique (they’re very short, so there’s some chance for collisions, just like with names of people). The match can generally come from 3 spaces: a datastore, the Idan file itself or other information files (written in Idan or some other language).

Where the namespaces you use come from depends on your specific use case. For example, when writing “standard” files which all users should have, it may be a good idea to write the namespace uids in the files themselves, to make sure the file is used correctly everywhere, even in the presence of prefix collisions (uids are unique, so they help prevent collisions). If you’re writing for local use, or prototyping, you can probably save time by relying on a datastore to get the uids.

How exactly the conversion from prefixes to their uids depends on the software you use, i.e. it should have command options to specify from where to get the uids. That is out of scope for this tutorial (check the software’s documentation).

One option is relevant to us here: specifying the uids in the Idan file itself. They can be specified at the top of the file, in the header area. That area can contain other things, which we will meet later. The interesting thing for us now about the header is that it can contain things of the following form:

@<name> <value_1> <value_2> ... <value_n>

These are basically options and settings for the file. The <name> part is the name of an option, and it is followed by 0 or more values. The name part can be localized, i.e. translated to any language and specified in your local language, not necessarily English. We’ll see later how; for now we’ll use the English names.

A namespace declaration may look like this:

@namespace "myns" <028d03ba-8d19-4895-a837-4df001ec185e>

namespace is the English name of the option that specifies a namespace uid. The first value for that option is the namespace prefix, inside double quotes, as usual for strings in Idan. The second value is the namespace uid, inside angle brackets, as usual for uids in Idan.

Note that the namespace prefix used on that line doesn’t have to be the one official prefix of the namespace. You can use any name there, and then use it in references throughout the file. This ability allows using namespaces whose “official” prefixes are identical, set long prefixes to shorter ones and perhaps make the names suit your personal preferences.

If you need more than one namespace, simply write several namespace lines at the top of the file. One line per namespace.

Namespace References

When specifying namespaces in the header, a connection is made between a uid and a short friendly label, much like anchors work. Using the colon notation x:y we can then refer to members of a namespace, but the namespace is itself a resource and has a uid - how do we refer to the namespace itself? For example, we may wish to state that some resource belongs to a certain namespace, and we need to refer to the namespace uid as the object part of our statement.

The notation is similar to anchor references - there’s a prefix character and the the prefix we assigned in the header entry. But the prefix character isn’t & like for anchors, but the at sign, i.e. @. So a reference to namespace with declared prefix “myns” would be @myns.

For example, suppose we declare a namespace myns using a header entry:

@namespace "myns" <028d03ba-8d19-4895-a837-4df001ec185e>

Later in the file we can refer to myns itself:

<%> smaoin:is_a              smaoin:Class
    nli:belongs_to_namespace @myns

Defining Namespaces

A namespace is a resource like any other, and can be defined in Idan like any other resource. But when defining a namespace and using it in the same file, we’ll use special syntax for its uid: exactly what we just saw above. This is what the examples below will do, but in general you can as well use a uid or a placeholder there.

A namespace can therefore be a member of another namespace, like any resource can. That is what allows the ns1:ns2:ns3 syntax, and is called namespace nesting. In addition, it is a convention for all top-level namespaces (i.e. ones which aren’t members of others) to be members of the ns namespace, which is provided by Smaoin. It allows to refer to them easily in certain cases, while maintaining just one namespace prefix-to-uid match: that of the ns namespace itself.

Earlier we met 4 namespaces provided by Smaoin, 3 of which contain localization related entities. One of those is the nli namespace - “nli” is its common prefix. The namespace’s name is NLI, which stands for Natural Language Interface. NLI is indeed the main namespace of the localization system, while the other two - lang and ns - are helpers with specific roles.

It’s time to meet new friends which happen to live in NLI. First, the property nli:belongs_to_namespace. When we define a namespace, we’ll use this property to make that namespace belong to ns. We’ll also use it for resources in general - more on that later. Second, the nli:Namespace class. This is the class of all namespaces. The property smaoin:is_a, which we met before, is used for declaring that a resource is a member of a class.

Assume our Idan file defines resources under the myns namespace, and also defines the myns namespace itself. Then we’ll with the header:

@namespace "smaoin" <b3742023-97ef-4fb0-9dd2-4582d946d6f1>
@namespace "ns"     <0074b583-b1fb-449c-aedf-ecd97c01eb82>
@namespace "nli"    <5dba2ce2-bab6-49dd-8547-d6dc7b344a91>

@namespace "myns"   <028d03ba-8d19-4895-a837-4df001ec185e>

Note that while myns is a new namespace we define, and we can freely choose a new uid for it, smaoin and nli and ns are standard namespaces and they already have uids chosen for them, which everyone should use. Exactly the same uids. There are several ways go get standard uids - they will be covered in later chapters. For now, assume we got the uid of smaoin, nli and ns, and pasted them in our file.

Next, we can start by defining the namespace. We can refer to its uid using the syntax we saw above, and make some initial statements:

@myns
	nli:belongs_to_namespace @ns
	smaoin:is_a nli:Namespace

This is short and simple. When we submit the file to a datastore, it will have these statements and will “know” about this new namespace. But something is missing - assume we’d like to get the uids of the namespaces we use from a datastore, instead of using those header lines. For that to work, the datastore needs to also know about the prefix of the namespace. We did choose a prefix myns in the header, but that’s just a setting for us, and a datastore doesn’t generate a statement from it. Its only purpose is to allow us to use myns in this file we’re editing.

We’ll get to the full detail of setting the prefix later, but for now we’ll mention two important things.

First, a namespace has both a prefix and a label, and they technically don’t have to be identical - although usually they are. When using the reference myns:Person, myns is the prefix of the “myns” namespace. That prefix is matched with a uid, and then the uid of Person can be found (later we’ll see exactly how it works). Person is the label of a resource which is a member of myns. But in the reference myns1:myns2:Person, while myns1 is a namespace prefix, myns2 is actually the label of the “myns2” namespace. That’s because myns:Person and myns1:myns2 are revolved in the same way - it’s a pair of a prefix and a label separated by a colon.

Second, NLI provides properties and classes that allow us to set a label for a resource, and a prefix for a namespace. A namespace needs both a prefix and a label. Since prefixes are specific to namespaces, we’ll examine them now briefly, and leave labels for later. We’ll go back to prefixes later and finish dealing with them, when we learn about the language and localization mechanism.

So, the short version of this is as follows. NLI provides a class nli:Text that is basically a localized version of the String type. You define a resource that is a mamber of this class, and set two things for it: The text, which is simply a String value, and the human language in which it is written. Then you can use the property nli:is_local_prefix_of to state that your new Text resource is a prefix for the namespace you just defined, in the specific language you chose. Global prefixes (i.e. for use in all languages) are possible too. For example, let’s assign the prefix “myns” as the English prefix of the myns namespace we are defining:

<%>
	nli:has_content "myns"
	nli:has_language lang:en
	smaoin:is_a nli:Text
	nli:is_local_prefix_of #^

This can simply be added before the previous statement block. Note that we also use the lang namespace here, and we should add a @namespace item for it to the top of the file.

Default Prefix

It’s possible to have references with the first namespace prefix omitted, e.g. :x and :x:y:z. This is possible when we choose a default prefix using a header entry, and this prefix is assumed when omitted. For example, if we choose myns as the default prefix, then :mylabel is the same as myns:mylabel.

The default prefix header entry’s English name is “default”. If the prefix is declared using a namespace entry, then the default prefix entry should appear somewhere after the namespace entry, not before it. The syntax, assuming English localization, is:

@default <prefix>

For example, the header may contain:

@namespace "myns" <028d03ba-8d19-4895-a837-4df001ec185e>
@default "myns"

And now we can have statements like this:

<%> :has_name "John Doe"

where we omit the myns: part, which becomes implicit. The processing software simply prepends the default prefix. Note that if an Idan file has no default prefix, omitting the namespace prefix from references is an error.

Reference Language

Prefixes and labels can be marked as belonging to a specific human language. When no language is specified for a reference, which was the case so far, the default one is used - either the one specified in the file’s header (more on that later) or some default of the processing software, e.g. the operating system user’s language.

Sometimes, especially when translating an Idan file to a new human language, it is useful to specify explicitly a language that isn’t the default. This can be done by appending an at sign (@) and then a language code to the reference. Languages and their codes will be explained later in detail. For now we can assume the language code for English is “en” and for French it’s “fr”, and then we can have a file in English with some French references.

Normally the language code refers only to the (last) label in the reference. To make it apply to the whole reference (or even two several references separated by commas), enclose it with parentheses. When specifying several literals inside parentheses, e.g. when having multiple objects for the same statement line, it’s possible to specify a language for all references between the parentheses.

TODO explain better the above, give reminder or link to the (a, b, c) form

<%>
    myns:name "John Doe"
    myns:age 34
    myns:taille@fr 170                                 -- height
    (mesnoms:favori_couleur)@fr                        -- favorite color
        (mesnoms:rouge, mesnoms:vert, mesnoms:bleu)@fr -- red, green, blue

[[!template id=todo text=“consider reducing indent convention from 4 to 2”]]

Defining Labels

Like namespaces, a label is a resource and can be defined in Idan like any other resource. Unlike with namespaces, no special syntax is required. We just define a label resource and link it to the resource it labels, as we will now see.

These labels have just this single purpose: Be used in references. A label is not the name on a resource, and doesn’t have any implied meaning: It’s just a technical tool for use with references. There are also (unrelated) ways to name and describe resources in human languages, which we’ll see later.

It is technically possible in an Idan file to define a label and then attach it to more than one resource. But this is a bad idea: Each resource should have its own label resource(s) and within each namespace, the labels should be unique. If myns:john doesn’t resolve unambiguously into a single uid (e.g. both John Smith and John Carpenter have this label), then it’s not very useful.

Let’s meet another friend from NLI: the property nli:is_local_label_of. In a similar way to how we used nli:Text resources as namespace prefixes, we can use then as resource labels. This single property isn’t the whole picture, but we can demonstrate the concept using it, and later see the small details.

Assume we have some statements describing a Person class:

<%>
	smaoin:is_a smaoin:Class
	smaoin:is_subclass_of myns:Being
	nli:belongs_to_namespace @myns

We’d like to allow English speaking users to refer to this class as myns:Person. We can do this my defining a label like this:

<%>
	nli:has_content "Person"
	nli:has_language lang:en
	smaoin:is_a nli:Text
	nli:is_local_label_of #^

How Resolution Works

To make things clear and unambiguous, let’s see how exactly processing software is supposed to resolve a reference into a resource uid. Once you understand how namespaces and labels work - which hopefully isn’t complicated - the resolution algorithm is simply a technical version of this understanding.

Assume the localization language of our Idan file is L.

Suppose we have a namespace uid z and a label with text “y”. We’d like to find a resource uid r which:

  1. Has a label in language L whose content is “y”.
  2. Belongs to namespace z.

We can go over the statements in our file, and possibly in the local datastore, and see if we can find such an r. If we found more than one, it’s an error. If we found none, resolution failed. If we found just one r, then this r is the reference resolution for z and “y”.

Now suppose we have a reference x:y (or :y with x being the default prefix). We should go over the namespaces declared in the header, and then possibly over the local datastore content, and see if we can find a namespace resource z which has a prefix x in language L. If we found more than one, error. If we found none, resolution failed. If we found exactly one z, use z and “y” as described in the previous paragraph to find r.

Now suppose we have a reference x0:x1: ... :xn (x0 can be omitted if it’s the default prefix). If we can reduce it to a smaller problem, we can apply the reduction recursively until we get the base case described above. Let’s do that. We begin by finding the namespace uid z for prefix x0. Next, we find r using z and label “x1”. Now we need to resolve the reference with namespace r and label x2: ... :xn. We again find a resource uid s which has label “x2” and is a member of r, and now we have a smaller problem: namespace uid s and label x3: ... :xn. We proceed until the label is just xn, which is the base case we solved already.

[[!template id=todo text=“Rewrite more clearly and in algorithm form”]]

Translation System

Intro

The translation system extends the String concept into the Text concept, which is a String with a Language attached to it. It allows having several text items which say the same thing, in different human languages. These Text items can be used in any context, and they’re used whenever possible in Smaoin and Idan:

We haven’t covered all of these yet. What we’re going to do is go over the general concept, and then see how it works with prefixes and labels, which will complete the picture of the reference system. In the next sections, when we visit the remaining localization system components, we’ll see how the translation mechanism is applied there too.

NLI provides the classes nli:Text and nli:Language. We can define a text resource and assign it a content String with nli:has_content, and assign a language using nli:has_language. Then we can use another statement to attach the text to some other resource: as its name or label or description and so on.

Some of the properties used for names, labels, etc. also have global variants. A global variant attaches a String to a resource, rather than Text, and can be used with any localization language. For example, the resource representing the arithmetic addition operator may have a name “plus” local to English, but also a global name “+” which is relevant to all languages.

Note that some of the properties we’ll see now also have inverse properties - you can go to Smaoin’s Idan files and see them there, or perhaps to some generated reference documentation.

Prefixes and Labels

For labels, there’s the superproperty nli:has_label, which you generally won’t use directly. The specific label properties are subproperties of it. For classes and other things that aren’t properties, the common property is probably nli:has_local_label (or its inverse we saw earlier, nli:is_local_label_of). It assigns a nli:Text label to a resource. There is also nli:has_global_label which assigns a language neutral String. This can maybe be useful in international standards like physical units of measurement, where the same symbol in English letters is used widely.

For properties, nli:has_label has two subproperties - which you generally won’t use directly - nli:has_field_label and nli:has_predicate_label. We met earlier the concept of “field names” and “predicate names”. The properties you’ll want to use directly (again, they also have inverses) are nli:has_local_field_label, nli:has_global_field_label, nli:has_local_predicate_label, nli:has_global_predicate_label.

For prefixes there’s the superproperty nli:has_prefix and subproperties nli:has_local_prefix (we used its inverse in previous sections) and nli_has_global_prefix.

Languages

Here’s an example that defines a local label of some resource:

<%>
    nli:has_content "an_amazing_resource"
    nli:has_language lang:en
    smaoin:is_a nli:Text
    nli:is_local_field_label_of #++

The thing here we haven’t talked about is the language, lang:en. Where does it come from? One of the framework namespaces we brielfly mentioned, prefixed lang in English, contains resources of type (i.e. members of class) nli:Language. Each language as a language code assigned to it, and a global label generally identical to the code. It’s possible to have local labels too of course. The global label for English is en, which means that lang:en refers to the resource representing the English language.

There are several dialects of English: British, American, Australian and so on. The same is true for other languages. NLI therefore provides a property nli:has_dialect which allows to declare that one language is a dialect of another. More precisely, it declares that words in the “base” language are also words in the “dialect” language.

This feature isn’t just for elegance: It actually allows to reuse translations. For example, any localized string in English can be used in British English, in American English and so on, unless they provide their own strings, which override the base English string. In other words, if processing software looks for a name or a description or a label or a prefix etc. of some resource in Australian English but it can’t find one, it will look for one in “base” English. This way all variants of English can share all the common strings, perhaps have “default” base translations for the ones that aren’t necessarily shared by all, and specific variants can provide their own strings for cases in which the variants differ.

It’s technically possible, due to Smaoin’s flexibility, to define new languages independently. But for easy collaboration, the convention is for new languages to be brought to Smaoin’s Idan files, where the lang namespace is officially managed. This also means the lang namespace’s official files serve as a list of languages used by Idan file authors worldwide.

Keyword Translation

[[!template id=todo text=“plan this”, maybe have an idan namespace]]

Names and Descriptions

The strings which identify resources in Smaoin and meaningless in the semantic system. They are meaningless by design, and this is emphasized by using randomly generated strings for uids. While they fulfill their role in the system well, they’re not very useful as-is to us, humans. We need a way to understand in human language terms what the uids refer to.

For this purpose, resources can have names and descriptions. These can be written in any language using nli:Text in the same was seen above. To get an idea what they look like, read below about conventions and/or read some Idan files.

The properties (which also have inverses) are nli:has_name and nli:has_description. There are no local/global pairs because names and descriptions are supposed to be words and sentences in specific human languages, i.e. there is no “global” variant.

Putting the Parts Together

TODO what should be here?

More on Value Literals

Characters

TODO explain the ''' and backtick forms

More Features, Tasks and Delicate Points

TODO move these to other chapters and remove this chapter?

Specifying Value Types

Translating a File to Another Language

Writing Whole Files

Intro

File Types and Names

Headers

Intro

An Idan file may contain, optionally, a header section at the beginning. The header may contain various options and settings which affect various aspects of the file: content, parsing, shortcuts and more. With time, as Idan evolves, new options and settings may become available.

Currently the header content is simply a sequence of entries. Each entry has a type and zero or more values. Each entry is written on its own line. The format of an entry line is:

@<type> <value> <value> <value> ... <value>

The type part is basically an alphanumeric name of the entry, and it can be localized to any human language. The values generally have the same syntax as Smaoin values used in statements, but other forms may be possible too if it becomes necessary in the future.

Language Chooser

There is one special type of entry, called the language chooser.

The language chooser must appear in any file that uses local labels or local names of header entries, because each resource may have translations to various languages, and the processing software needs to know by which translation to match the label to a resource uid. The same applies to header type names, which may be localized to human languages.

In practice, it means that every human-readable Idan file needs the chooser. Only files that are purely uids and values don’t, but those aren’t meant to be read by humans anyway, thus it’s no surprise that localization isn’t relevant.

Since the chooser affects the parsing of the “regular” header entries, it must come first. If there is a non-empty header, it must begin with the language chooser. The chooser is a single line, like other entries, and the syntax is as follows:

@@ <language-code>

There may optionally be any whitespace between the @@ and the language code. That code is the same code given to nli:Language resources through the nli:has_code property. It is specified between double quotes. For example:

@@ "en"

is for English.

Structure

In addition to the entry lines, the header has two more parts. One of them is the language chooser, and the other is the delimiter. The delimiter marks the end of the header, after which the rest of the file - i.e. statements and statement blocks - may appear.

If the delimiter appears and is the first thing in the file (excluding comments and blank/whitespace lines), then the file is considered to have an empty header. Having an empty header is the same as having no header at all, but it still may make sense to write the delimiter to avoid forgetting it later when entries are placed in the header.

The delimiter is a single line that looks like this:

@@

We now know enough to write a complete header. The minimal header is:

@@

A header with content begins with a language choose, and may then have entries. Having a header which contains just the chooser is fine:

@@ "en"
@@

Now some entries (I’m making some random ones for the example):

@@ "en"
@namespace "myns1" <96944e90-b52e-4434-b6f7-eba1d31af5e0>
@namespace "myns2" <8f5cd9e8-eb9d-41d8-92a5-fbdcff22b13b>
@namespace "myns3" <4281acb5-5278-4dbd-89a9-9b9635824c3f>
@@

Recognizing the Header

Knowing what we just learned, we may want to ask: How can software distinguish between a namespace reference and a header entry? Both of them begin with a @ followed by a name, and both can be at the beginning of a line.

Let’s instead answer a bigger question: How does software recognize the header part of an Idan file, and correctly interpret things as what we meant them to be? The following description isn’t a full accurate algorithm, but it suggests how the header can be recognized without ambiguity.

An Idan file may begin - ignoring comments and whitespace for a moment - with the header or with a statement (or a statement block). The way to tell is by examining whether the content starts with @@ or not. If it does, then it is either the language chooser (i.e. beginning of header) or the delimiter (i.e. end of empty header). The content of that line determines which one it is. Then, every @ line before the delimiter is an entry, and every similarly looking @ after the delimiter is part of a statement. No ambiguity.

If the content doesn’t start with @@, then there is no header at all, and a @ form is a namespace reference, not a header entry.

Note you’ll likely never write a file without a header containing some namespace entries, so basically even software that doesn’t perfectly handles recognition (although it’s trivial to handle) won’t get confused. There is a @@ that starts the header and a @@ that closes it.

Entries

Namespace Entry

The namespace entry introduces a namespace prefix for use in the file, and specifies the resource uid of the namespace. Any prefix is valid, not just the formal prefix chosen for that namespace, but in most cases it’s recommended to use the namespace’s format prefix.

A namespace entry has the following form:

@<namespace> <prefix> <uid>

where:

For example, these are the Smaoin framework namespaces, assuming the English localization for the type name:

@namespace "smaoin" <b3742023-97ef-4fb0-9dd2-4582d946d6f1>
@namespace "ns"     <0074b583-b1fb-449c-aedf-ecd97c01eb82>
@namespace "lang"   <6414df14-4073-4968-9470-900fdd21b580>
@namespace "nli"    <5dba2ce2-bab6-49dd-8547-d6dc7b344a91>

Ontologies

Copying

[[!template id=todo text=“Update this to CC0!” more=""" I decided to move to CC0 because the law is a tool of power and violence. Using licenses, even libre ones, continues to create the problem of property and patents and copyright and all that stuff. And since law is violence, I’m absolutely not going to sue anyone. The only problem is that CC0 may - like BSD and MIT - look like “I’m neutral to the kind of usage, do as you like including proprietayr use, I don’t care and have no ethics about this”. The solution will be to state clearly what the idea really is, with inspiration from http://copyheart.org. """]]

Rel4tion is part of the free software, free culture, free information and free knowledge movements. All the “free”s here refer to freedom, not to price. The license used by Smaoin’s files and the various ontologies made here is Creative Commons Attribution Share-Alike 4.0 (in short: cc by-sa 4.0).

In order to continue to promote, protect and enhance these freedoms, and in order to allow for unlimited copying and modification between the various ontologies made by people - inside and outside Rel4tion - it is suggested that all the ontologies use this license. When the freedoms mentioned above are more common, perhaps in the future Rel4tion will move to public domain instead, or more precisely CC0. But that will be a coordinated move, and for now it’s the cc by-sa 4.0 license.

If you have any objection to using that license, please share your concerns. Note that a copyleft license is used intentionally. A discussion about using non-copyleft licenses should be done in the context of a project-wide or a community-wide move from the current cc by-sa.

Licenses which don’t designate the work as a Free Culture Work, such as the Non-Commercial variants of the Creative Commons licenses, aren’t an option at all. Everything in the community of sharing and cooperation should be free culture work, no exceptions. If one person can get paid for some file, while another cannot, that is not a community - it’s selfish competition and ugly separation.

In particular, Rel4tion’s ontology repository stores and supports the development of free culture works only. In other words, proprietary work is considered “uploaded by mistake” and should be either relicensed (the preferred solution, achieved by nicely asking and discussing the issue) or removed (if the first option is causing arguments and may take time to settle, or the author isn’t available for discussion).

Prolog

Sections

Objects

Copying

Unlike ontologies, there are significant chunks of information that are private and are never released to the public. For example, personal information you may store on your desktop. Perhaps your full name, address, web browser history and bookmarks, family tree, instant messaging contacts, personal to-do list, e-mail inbox content and so on.

The private information isn’t released, and therefore licensing issues aren’t relevant. Rel4tion tries to make it easy and safe to manage both private and public information on the same desktop. Hopefully it succeeds. There may also be information you share only with friends, but not with the rest of the world. That is considered private too.

For public information which is shared on the Internet with the world, for example information about songs and artists and music styles, the convention is exactly the same as for ontologies. See above the section about conventions for ontolgies.

Structure

Conventions

Intro

We’ll see in this chapter which conventions are generally used when writing Idan files. Some of them are arbitrary and may be a matter of style - but are still provided as a warm suggestion - while some are quite important for consistency and you should probably get used to them.

Some of these conventions were already used in the previous chapters of this tutorial, so they won’t suprise you.

Naming Files

There are two common kinds of files: Ontologies (which are basically volcabularies providing classes and properties) and data (which is basically objects described using ontologies). Having this distinction is just a convention for modularity and readability, i.e. it’s the same Idan with the same syntax everywhere.

Idan files can have any name, but the recommended file extension for them is idan.

The name part, before the extension, works as follows. When defining an ontology, it should usually be the name of the ontology or the main namespace it uses. For standard and established ontologies, the namespace prefix or the ontology name abbreviation (often these are the same thing) in lowercase is fine too. For example, nli.idan.

File Structure

Number of Columns

In Idan there isn’t much reason to need long lines. Limiting to 80 columns per lines makes the files friendly to everyone, regardless of whether they use a terminal emulator with split screens or a fancy GUI editor.

Therefore, limit to 80 columns. Every serious text editor suitable for coding can mark the 80th column, and even automatically align text and adapt the display as needed. Use these features! For example, I use Vim for nearly all my text editing tasks, and using the column limit is not only easy, but also allows me to split the screen and see many open files at the same time.

Don’t worry about writing long Strings. If you plan to incorporate a whole plain text document into your Idan code, you should probably put it in a separate file instead, and refer to it from the Idan file. It’s much much better for modularity and readability.

Indentation

When indenting code for Indented Blocks or for other reasons, the preferred style is 4 spaces for each indentation level. This is suggested for any file that is to be shared with others, especially for ontolgies. Several text editors support smart modes, in which indentation tabs are treated like tabs when navigating the text, but actually spaces are inserted into the file. Using these modes makes things nicer (Vim does that, I think Gedit does too with a plugin, and I imagine Emacs does as well).

Some Idan files on this website were written before this convention, and they may use other indentation styles, such as tabs. They should be fixed to use 4 spaces for indentation.

Writing Localized Blocks

Naming Resources

Generating UIDs

Putting It All Together

Intro

Community

Website and Wiki

Community Server

IRC, Mailing Lists and Friends

For now, see the [[/contact]] page. In the future there will hopefully be more :-)

Getting Involved

Resources and Tools

More Material

Ontology Repository

Text Editor Support

Datastores

Queries

Management Applications

Visualization Tools

Applications that Use Idan

Making Software Work With Idan

What’s Next

[[!template id=todo text=“Consider supporting $^^^ and $^3 just like with # references.”]]

[See repo JSON]