Home → Repo ^yEzqv → Branch master Files → projects → idan → i18n.mdwn

Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

i18n.mdwn

At the moment I have two text-based systems to bridge between computer language and human language. One is the labeling system, which allows the user to assign namespaces and labels to resources. The other is the description system, which allows the user to explain and describe the information in human language.

I’d like both of them to be localized. Hopefully this will lower the entry barrier and allow simple users to participate in the work of data modeling. People like me, without professional knowledge or experience with information system design. How do I provide and find translations?

I already added the smaoin:Text class and properties which give each Text its language and its content. Now the question is where and how translations are specified. Here’s an idea: Like .po files, place the translations in separate files. Use a reverse property is_local_label_of, and in each translation file define a Text resource for each resource in the original namespace/ontology.

Now one last question remains, I hope: Should these Text resources have their own namespaces? How does one refer to them after they’re inside the database? Hmmm… good question. Namespaces may be assigned like this: Choose some special namespace prefix and use it as a sub-namespace of each translated namespace. For example, we can have smaoin:transl and ns:transl and metro:transl and so on. Then, each such namespace can have a sub-namespace for each language, e.g. smaoin:transl:en and smaoin:transl:he.

While this solution is more or less reasonable, it adds a lot of clutter and mess. What is the benefit of having these namespaces? How often does one really need to refer to the specific translation resources? Question: What happens if we don’t assign these translations any labels at all? Answer: We can still easily find a translation by asking for a Text of a given language which serves as a label for some resource we’re interested in.

DECISION: Translations will have no namespace and no label.

Now, if you think that’s easy… we’re not done here yet! In my ttl files, I use namespace-label pairs to refer from one resource to another. But if these pairs come from translations in external files, how am I supposed to do it? Ideas:

Use special syntax for inter-file references
Use embedded translations

The first idea creates duplication: References would be defined for the file and for the database separately, even though they would probably be identical. It makes sense to allow embedding as least a single translation in the .ttl file, so the file itself can be written using the references. Question: Should this initial labeling have its own properties, or regular Text objects should be defined?

Let’s examine the changes. This is a class definition:

	<BU>
		smaoin:has_label "BasicUnit" ;
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

This is a version which uses localized labels:

	<BU>
		smaoin:has_label <BULEN> ;
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .
	
	<BULEN>
		smaoin:is_a smaoin:Text ;
		smaoin:has_language lang:en ;
		smaoin:has_content "BasicUnit" .

I could use [ ] to avoid giving the Text an explicit uid and embedding its definition in the BasicUnit definition, but this would cause a problem: Each database would give it a different arbitrary uid.

This is fine, except for three things:

<BULEN> repeats
The label of BasicUnit is not at the beginning of its definition, which makes the files much less readable
How was lang:en defined in the first place?

At this point using specialized properties is tempting, but I want to try doing it without them.

The BULEN duplication may be solved using anchors: It’s possible to mark a uid with an anchor and refer to it from somewhere else in the same file. It’s even reasonable and probably useful to allow anchors between files (definitely useful for Smaoin meta model definitions, not necessarily useful otherwise). For now assume I use the same character ‘$’ used for namespaces. Later I’ll decide how to handle this. Suggested syntax:

	<BU>
		smaoin:has_label $bulen ;
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .
	
	<BULEN> =bulen
		smaoin:is_a smaoin:Text ;
		smaoin:has_language lang:en ;
		smaoin:has_content "BasicUnit" .

The problem of “BasicUnit” not being near the beginning of the definition can be solved by any of these:

Switching places between BU and BULEN, and then making the label statement the first statement of BULEN
Using a comment just above BU (but this is duplication of the label, and not translated automatically)
Giving the anchor the name of the label (again duplication of the label…)

Assume I take the first approach. Reasonable. Now there’s one thing left: How was lang:en defined? I’d like to clarify the problem. These are the “fundamental” namespaces:

smaoin
ns
lang

The dependencies between them are the tightest possible: Each depends on the other two.

smaoin needs ns because smaoin belongs to namespace ‘ns’
smaoin needs lang because because its labels are in lang:en
ns needs smaoin because it’s a smaoin:Namespace
ns needs lang because its labels are in lang:en
lang needs smaoin because it’s a smaoin:Namespace
lang needs ns because lang belongs to namespace ‘ns’

Thus, the problem is only initial: I need to decide how a database is “bootstrapped” from nothing, and then things will just work. Basically I can write a “dumb” program which just takes an N-Triples-like file and inserts it into a quadstore, but that means I need to prepare such a file manually. Possible of course - not nice but possible. Error prone, but possible.

Another option is to write the files using their own defined systems: I use a high-level language which supports the labeling system. This is actually the lowest level at which the text is human readable. Otherwise, it’s worse than reading Assembly code. The problem here is that I cannot use dumb software: I have to use something aware of labeling. It’s not an issue, because such software is needed anyway for parsing these Turtle-like files. It’s just that there’s a dependency between the ttl file (defining smaoin:Text, smaoin:has_label, etc.) and the actual uids of these resources, coded into the software.

Assuming the inter-dependency between the 3 namespaces, they must be sumbitted to the software together, so that their labels can be detected and resolved. This now makes it easier. Since the labeling system is implememted in software, lang:en is a regular resource found using regular techniques.

Before I decide, let’s handle the rest of the human interface system as well. We also have a name and a description for each resource. I’d also like to add another field, comment, for any other text. It can be used to mark any misc information for humans, including temporary plain-text tagging as a workaround until new properties are introduced. Who knows. Let’s just make it there. Hmmm… no, maybe it’s a bad idea. It could cause confusion and ambiguity. It’s better to add it later when I actually need it for something. DECISION: No comment for now, just name the description.

Alright. Now things are a bit different: using the anchors like I did means 3 anchors per resource, which is not nice. Instead I can use reverse properties, and have just 1/3 of the anchors:

	<BULEN>
		smaoin:has_content "BasicUnit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_label_of $class-basic-unit .

	<BU> =class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BUNAME>
		smaoin:has_content "Basic Unit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_name_of $class-basic-unit .

	<BUDESC>
		smaoin:has_content "A basic unit of measurement" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_description_of $class-basic-unit .

Actually I may be able to keep the order as before, because with so much text per resource, it may make sense to surround it with separator comments and maybe mention the name. Or just the separators, and keep the order like now.

One more thing remains: International identifiers. Some universally used concepts sometimes have globally accepted symbols, and it makes sense to make then independent of localization, i.e. always available to the user. For example:

Language codes: en, he, fr and so on
Physical measurement units and quantities: m (meter), s (second), F (force), a (acceleration), mu (friction coeff)

Therefore two separate label properties will be used: local label and global label.

Here’s a summary of the properties I will need:

	smaoin:has_local_label
	smaoin:is_local_label_of
	smaoin:has_global_label
	smaoin:is_global_label_of
	smaoin:has_name
	smaoin:is_name_of
	smaoin:has_description
	smaoin:is_description_of

DECISION: I’m taking the no-default-language approach. No base language. All translations have equal status.

Great. I’d like to choose the final syntax for anchors now. Also, enabling cross-file anchors would be great.

First of all, anchors and namespaces substitutions are entirely different. The character ‘$’ is not necessarily good for both. Let’s examine the uses of special characters:

@ - namespace declarations and other per-file settings for the parser $ - substitution directive = - anchor marker

Okay, so ‘$’ is as good for each use as it is for the other. But if I use it like this, namespaces and anchors won’t be allowed to have the same name. For example, maybe I’m using namespace ‘metro’ and my file happens to have an anchor named metro. Whenever I add an anchor, I need to make sure no namespace has the same name… Whenever I add a namespace, I need to make sure no anchor has the same name… Not good. Is there no better way to handle this?

Yes, there is. Use a different syntax. I have one top-category character left, which is not in use yet: ‘%’. I can use it as “reference to anchor”. I can also use * or & like in the C language. Example of all options:

	<BULEN>
		smaoin:has_content "BasicUnit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_local_label_of %class-basic-unit .

	<BU> =class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BUNAME>
		smaoin:has_content "Basic Unit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_name_of *class-basic-unit .

	<BUDESC>
		smaoin:has_content "A basic unit of measurement" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_description_of &class-basic-unit .

DECISION: I’m taking the ‘%’ character.

What about the anchor marker, ‘=’? I can replace it too. Examples:

	<BU> =class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BU> ~class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BU> !class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BU> ^class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BU> &class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BU> *class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BU> +class-basic-unit
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

Due to the fact that this language allows to write whole triples on single lines of their own, the syntax choice must make sure the syntax still works well in these cases. For example:

	<bbbbbbbbuuuuuuuu>   smaoin:belongs_to_namespace   $metro .

Adding an anchor with ‘=’ could look like this:

	<bbbbbbbbuuuuuuuu> = class-basic-unit      smaoin:belongs_to_namespace      $metro .

Reasonable? I’d say it is. The ‘=’ character does good work here, because it implies the anchor is like a variable, a symbol, assigned the uid and serves as a reference to it. Examples with alternative characters:

	<bbbbbbbbuuuuuuuu> = class-basic-unit      smaoin:belongs_to_namespace      $metro .

	<bbbbbbbbuuuuuuuu> ~ class-basic-unit      smaoin:belongs_to_namespace      $metro .

	<bbbbbbbbuuuuuuuu> ! class-basic-unit      smaoin:belongs_to_namespace      $metro .

	<bbbbbbbbuuuuuuuu> ^ class-basic-unit      smaoin:belongs_to_namespace      $metro .

	<bbbbbbbbuuuuuuuu> & class-basic-unit      smaoin:belongs_to_namespace      $metro .

	<bbbbbbbbuuuuuuuu> * class-basic-unit      smaoin:belongs_to_namespace      $metro .

	<bbbbbbbbuuuuuuuu> + class-basic-unit      smaoin:belongs_to_namespace      $metro .

DECISION: I’m taking ‘=’.

Good, characters chosen. But I need to be able to refer to anchors from other files! Something like %anchor is the regular syntax… how do I add a filename/reference? Here’s an example without it:

	<BULEN>
		smaoin:has_content "BasicUnit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_local_label_of %class-basic-unit .

And here’s are suggested syntax examples:

	<BULEN>
		smaoin:has_content "BasicUnit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_local_label_of %other.ttl::class-basic-unit .

	<BULEN>
		smaoin:has_content "BasicUnit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_local_label_of %other.ttl/class-basic-unit .

	<BULEN>
		smaoin:has_content "BasicUnit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_local_label_of %class-basic-unit:other.ttl .

	<BULEN>
		smaoin:has_content "BasicUnit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_local_label_of %class-basic-unit@other.ttl .

DECISION: I’m taking the ‘/’ syntax. The prefix may be a whole path, but on hierarchical file systems the last part must of course be a file, and the other parts folders. Like any file path.

Hmmm… more trouble.

I just realized the name and description don’t need to use the anchor! They can use the label. I can now move the anchor to the label Text. Or, instead… I can add new automatic anchors to avoid duplication. Instead of giving anchor names and labels separately (duplication), I’ll add relative anchors, e.g. %+1 means “the next mentioned subject”. I can also use %- and %+ to avoid typing the ‘1’ all the time and make it more readable. Also, for small numbers (and large too, if you wish) repeat ‘-’ or ‘+’ to indicate the relative position count: For example, %+++ is the same as %+3.

Note that without the labels, named anchors do make sense. It’s just that I don’t want to cause so much duplication: Every resource will have both a named anchor (e.g. class-basic-unit) and a label (e.g. “BasicUnit”).

DECISION: I’m accepting this for now, because it avoids the naming redundancy.

We’re not done yet.

IDEA: Adopt the [ ] syntax of Turtle, but allow specifying the uid in there! This may result with cleaner anchor-less code! Let’s go to the Turtle spec and learn about it… hmmm… there is no syntax for explicitly specifying the uid. Let’s see an example without uids first. This is what we already have so far:

	<BULEN>
		smaoin:has_content "BasicUnit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_local_label_of %+ .

	<BU>
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

	<BUNAME>
		smaoin:has_content "Basic Unit" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_name_of %- .

	<BUDESC>
		smaoin:has_content "A basic unit of measurement" ;
		smaoin:has_language lang:en ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_description_of %-- .

And this is an example of a translation in another file:

	<BULFR>
		smaoin:has_content "SomeTextHere" ;
		smaoin:has_language lang:fr ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_local_label_of metro:BasicUnit .

	<BUNAMETL>
		smaoin:has_content "Some text here" ;
		smaoin:has_language lang:fr ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_name_of metro:BasicUnit .

	<BUDESCTL>
		smaoin:has_content "Some text here" ;
		smaoin:has_language lang:fr ;
		smaoin:is_a smaoin:Text ;
		smaoin:is_description_of metro:BasicUnit .

Now, this is the [ ] syntax without uids:

	<BU>
		smaoin:has_local_label
		[
			smaoin:has_content "BasicUnit" ;
			smaoin:has_language lang:en ;
			smaoin:is_a smaoin:Text
		] ;
		smaoin:belongs_to_namespace $metro ;
		smaoin:has_name
		[
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text
		] ;
		smaoin:has_description
		[
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text
		] ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

And the ‘fr’ translation:

	metro:BasicUnit
		smaoin:has_local_label
		[
			smaoin:has_content "SomeTextHere" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text
		] ;
		smaoin:has_name
		[
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text
		] ;
		smaoin:has_description
		[
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text
		] .

This can look nicer than the anchors, doesn’t it? I mean, the anchors are fine, but each resource becomes 4 separate blocks, which makes the file a bit less readable. This syntax brings us back to a single block. However, don’t hurry to be impressed! We still have to add the subject uids to the inner blocks. Here’s a suggestion: I’m not going to have them generated many times. Even if they are omitted, it will be just to let the parser fill them in, or to let the database generate them and use the values as the canonical uids. Either way, there should probably be no blank-forever statement components. Therefore, we don’t we add some sign to fill the empty spot of the subject?

Let’s do it. The subject would normally be there as usual, inside < and >, but if it’s not there… simple. So simple. I can just let there be a blank uid!

Another thing: Turtle seems to not require, or not allow, the use of ‘.’ before the end of the [ ] block. The last statement looks like a sentence without the ‘.’ at the end. I’d like to require that the ‘.’ is put there anyway for consistency, even though not necessarily required for parsing.

Updated text without uids:

	<BU>
		smaoin:has_local_label
		[
		<>
			smaoin:has_content "BasicUnit" ;
			smaoin:has_language lang:en ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:belongs_to_namespace $metro ;
		smaoin:has_name
		[
		<>
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:has_description
		[
		<>
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

And the translation:

	metro:BasicUnit
		smaoin:has_local_label
		[
		<>
			smaoin:has_content "SomeTextHere" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:has_name
		[
		<>
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:has_description
		[
		<>
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] .

Now let’s add the uids:

	<BU>
		smaoin:has_local_label
		[
		<BULEN>
			smaoin:has_content "BasicUnit" ;
			smaoin:has_language lang:en ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:belongs_to_namespace $metro ;
		smaoin:has_name
		[
		<BUNAME>
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:has_description
		[
		<BUDESC>
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:is_a smaoin:Class ;
		smaoin:is_subclass_of :Term ;
		smaoin:is_subclass_of :Unit .

And the translation:

	metro:BasicUnit
		smaoin:has_local_label
		[
		<BULFR>
			smaoin:has_content "SomeTextHere" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:has_name
		[
		<BUNAMETL>
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .
		] ;
		smaoin:has_description
		[
		<BUDESCTL>
			smaoin:has_content "Some text here" ;
			smaoin:has_language lang:fr ;
			smaoin:is_a smaoin:Text .

Great. We’re done with the [ ] syntax.

I’d like to discuss another issue. All of this material should definitely become a serius document later, but for now I’m just talking and planning. The issue is the usage of different opening and closing characters. For example, let’s take the ones used for uids. The angle brackets. Here’s an example:

<uid>

The opening bracket ‘<’ is ASCII character 0x3C, which is the “less than” symbol. The closing bracket ‘>’ is ASCII character 0x3E, which is the “greater than” symbol.

Now, assume I’m writing this file in a Right-to-Left language, e.g. in Hebrew. My uid may now look like this:

<מזהה>

And this does exactly what we want: 0x3C is still the opening bracket and 0x3E is the closing bracket. In other words, no special support is needed in this case for RTL languages. However, I do need to pay attention to this, because not necessarily all characters get reversed in RTL fonts.

I started translating smaoin.ttl into the new syntax, with the nested blocks, and the most visible thing in the updated file is its ugliness. Even though I did use indentation, I can’t ignore the big difference in readability. Before I proceed to translate more text, I’d like to examine other forms of indentation and decide whether I want to keep using nested blocks or go back to relative anchors.

Here’s a class definition which uses nested blocks:

	<TP>
		:has_local_label
		[
		<TPLEN>
			:has_content "TransitiveProperty" ;
			:has_language lang:en ;
			:is_a :Text .
		] ;
		:belongs_to_namespace $smaoin ;
		:has_name
		[
		<TPNEN>
			:has_content "Transitive Property" ;
			:has_language lang:en ;
			:is_a :Text .
		] ;
		:has_description
		[
		<TPDEN>
			:has_content "TODO" ;
			:has_language lang:en ;
			:is_a :Text .
		] ;
		:is_a :Class ;
		:is_subclass_of :Property .

For comparison, here’s an original-style version without i18n:

	<TP>
		:has_label "TransitiveProperty" ;
		:belongs_to_namespace $smaoin ;
		:has_name "Transitive Property" ;
		:has_description "TODO" ;
		:is_a :Class ;
		:is_subclass_of :Property .

This is a version which uses relative anchors:

	<TPLEN>
		:has_content "TransitiveProperty" ;
		:has_language lang:en ;
		:is_a :Text ;
		:is_local_label_of %+ .
	
	<TP>
		:belongs_to_namespace $smaoin ;
		:is_a :Class ;
		:is_subclass_of :Property .
	
	<TPNEN>
		:has_content "Transitive Property" ;
		:has_language lang:en ;
		:is_a :Text ;
		:is_name_of %- .
	
	<TPDEN>
		:has_content "TODO" ;
		:has_language lang:en ;
		:is_a :Text ;
		:is_description_of %-- .

Now I’m goint to try several indentation styles and see how it goes. Maybe one of them will look better than my current indentation style. If not, I’ll consider going back to relative anchors.

_____________ B _____________

_____________ 1 _____________

	<TP>
		:has_local_label
			[
			<TPLEN>
			:has_content "TransitiveProperty" ;
			:has_language lang:en ;
			:is_a :Text .
			] ;
		:belongs_to_namespace $smaoin ;
		:has_name
			[
			<TPNEN>
			:has_content "Transitive Property" ;
			:has_language lang:en ;
			:is_a :Text .
			] ;
		:has_description
			[
			<TPDEN>
			:has_content "TODO" ;
			:has_language lang:en ;
			:is_a :Text .
			] ;
		:is_a :Class ;
		:is_subclass_of :Property .

_____________ 2 _____________

	<TP>
		:has_local_label
			[
			<TPLEN>
				:has_content "TransitiveProperty" ;
				:has_language lang:en ;
				:is_a :Text .
			] ;
		:belongs_to_namespace $smaoin ;
		:has_name
			[
			<TPNEN>
				:has_content "Transitive Property" ;
				:has_language lang:en ;
				:is_a :Text .
			] ;
		:has_description
			[
			<TPDEN>
				:has_content "TODO" ;
				:has_language lang:en ;
				:is_a :Text .
			] ;
		:is_a :Class ;
		:is_subclass_of :Property .

_____________ 3 _____________

	<TP>
		:has_local_label
			[
				<TPLEN>
				:has_content "TransitiveProperty" ;
				:has_language lang:en ;
				:is_a :Text .
			] ;
		:belongs_to_namespace $smaoin ;
		:has_name
			[
				<TPNEN>
				:has_content "Transitive Property" ;
				:has_language lang:en ;
				:is_a :Text .
			] ;
		:has_description
			[
				<TPDEN>
				:has_content "TODO" ;
				:has_language lang:en ;
				:is_a :Text .
			] ;
		:is_a :Class ;
		:is_subclass_of :Property .

_____________ 4 _____________

	<TP>
		:has_local_label [
			<TPLEN>
			:has_content "TransitiveProperty" ;
			:has_language lang:en ;
			:is_a :Text .
		] ;
		:belongs_to_namespace $smaoin ;
		:has_name [
			<TPNEN>
			:has_content "Transitive Property" ;
			:has_language lang:en ;
			:is_a :Text .
		] ;
		:has_description [
			<TPDEN>
			:has_content "TODO" ;
			:has_language lang:en ;
			:is_a :Text .
		] ;
		:is_a :Class ;
		:is_subclass_of :Property .

_____________ 5 _____________

	<TP>
		:has_local_label [
			<TPLEN>
				:has_content "TransitiveProperty" ;
				:has_language lang:en ;
				:is_a :Text .
		] ;
		:belongs_to_namespace $smaoin ;
		:has_name [
			<TPNEN>
				:has_content "Transitive Property" ;
				:has_language lang:en ;
				:is_a :Text .
		] ;
		:has_description [
			<TPDEN>
				:has_content "TODO" ;
				:has_language lang:en ;
				:is_a :Text .
		] ;
		:is_a :Class ;
		:is_subclass_of :Property .

_____________ E _____________

After some thinking, examination and frustration, I came up with the two best suggestions. One is the anchors, and the other is this:

	<TP>
		:has_local_label
			[
			<TPLEN>
			:has_content "TransitiveProperty" ;
			:has_language lang:en ;
			:is_a :Text .
			] ;
		
		:belongs_to_namespace $smaoin ;
		
		:has_name
			[
			<TPNEN>
			:has_content "Transitive Property" ;
			:has_language lang:en ;
			:is_a :Text .
			] ;
		
		:has_description
			[
			<TPDEN>
			:has_content "TODO" ;
			:has_language lang:en ;
			:is_a :Text .
			] ;
		
		:is_a :Class ;
		
		:is_subclass_of :Property .

Now I need to decide which one I want to use in my own files. Let’s see…

DECISION: I’m keeping the [ ] syntax in the language, but I’ll use relative anchors.

Another syntax addition: It makes sense to write text and then have the computer fill in the uids for you. However, what if you want to fill just some of them? For example, maybe some you want to auto-fill in the file, and others will be deduced later by other software? Solution: Mark uids-to-be-filled with a special mark. For example, put ‘$’ or ‘%’ there, or something like that.

Which character should I put there? Hmmm… the problem with a single character is that it looks much less like a uid. I can use something like <$> there. DECISION: Since it’s easy to change later, I’ll take <$> for now, which looks like a uid and $ already means macro replacement, so it makes sense.

Another two aspects of i18n which I’d like to discuss:

Translation of Idan keywords
Introduction of the language-dialect relation (and its inverse)

First, keyword translation. When an Idan file is passed to a triplestore or to any other software which needs to parse it, it has to know the language of the file. For resolution of namespace-label pairs all it needs is the @lang directive, but it doesn’t have this information for the directives themselves. Obviously if we tell the software about the language externally, the @lang directive becomes somewhat useless.

IDEA: Make the @lang directive not use any localized symbols! Then it can be parsed and understood by software regardless of the language specified by it. This would remove the circular dependency between the localization of the directive and the language specified.

The directive syntax at the moment is @namespace CONTENT.... Let us therefore introduce new syntax for specifying the language of the document. It applies to both the directives and the labels. Of course directives may also have global names, which can be used with any language. Syntax suggestion:

@@en

The label used for the language should preferably be global - otherwise smart software may try to examine all localizations of all language labels, but this is just a “plan B” and after a language is detected, the user should update the directive to specify the global label.

This directive should be the first thing in the file. If it’s not there, the file is invalid: No default language is assumed. However, it’s possible that a language is passed to the software externally. In any case, standard files which want to be supported by any software should specify the directive.

Updating the core files… done.

Now, the second thing. Dialects are important because they can function as a separate language on one hand, but can also function as partial translations: You translate everything that is not identical to the base language, but can keep intact the base language translation which apply to the dialect too. Unlike in the case of software string localization, which also involves currency and number formatting and so on, our case here is only about the languages themselves - not the broader concept of locales. That’s because NLI is meant for translation of things like labels and namespaces, which should not have auto-generation features like the % directives of printf. At least for now, it’s just languages.

Due to this difference, the scheme gettext uses for naming locales is not optimal here. The country doesn’t matter, and the encoding is always UTF-8. What matters is just the fact that one language is the encoding of another, and labels should never be providers of information like country. For example, the label en_US should never be used to deduce the country where this dialect is used. What I need is a simple convention - note, it’s just a convention - for naming dialects. Here’s an idea for now, since languages use nli:labels like any other resource and can’t use much more than letters, numbers and underscores:

Each dialect of a base language labeled L has a label of the form L_D where D implies where the dialect is used.

D is not necessarily a country code - it may refer to a region or to anything else used to refer to the dialect.

Great, now let’s add things to NLI. We can basically add a Dialect class, but it doesn’t seem necessary to me: A dialect is simply any language which is a dialect of another. All I need is a transitive property to express the language-dialect relation. Adding… done.

[See repo JSON]

Clone

Branches

Tags

i18n.mdwn