Home → Repo ^yEzqv → Branch master Files → projects → idan → label-syntax.mdwn

Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

label-syntax.mdwn

In my Turtle-based files, I have a problem with the namespace-label system. This is the beginning of metro.ttl defining the Metro ontology (uids intentionally removed):

	@namespace smaoin <S>
	@namespace metro <M>
	@default metro

	<M>
		smaoin:has_label "metro" ;
		smaoin:has_prefix "metro" ;
		smaoin:belongs_to_namespace :metro ;
		smaoin:is_a smaoin:Namespace .

	<T>
		smaoin:has_label "Term" ;
		smaoin:belongs_to_namespace :metro ;
		smaoin:is_a smaoin:Class .

The first problem visible in this text is the fact references point to resources defined in the same file. Sometimes they are even circular, e.g. a class definition uses some property, and the property’s domain or range is that class. As a result, if the file is just processed sequentially many reference resolutions will fail, since the resources having them are defined later in the file.

At least two possible straight-forward solutions exist:

First define the label and namespace of every resource, and then state all the other information
Define the process of parsing the file

Clearly the second one is better for the user. But we can also decide that the syntax allows defining namespaces and labels anywhere, and does reordering before parsing. In that case, both options actually become practically the same. Since the reordering is just a specific way to implement the more general concept of non-sequential parsing, I take option 2.

Great. But we’re not done yet. There is another problem, one which I haven’t thought about yet. At the moment, the namespce declarations at the top of the file are used for a single purpose: allow namespace:label pairs to use a namespace prefix instead of <uid>:label which is not friendly. You declare which namespaces you’re going to use, and then use their prefixes. The problem is that no mechanism is provided for referring to the namespace itself.

Each namespace is a unique concept of its own, and is modeled as an object of class smaoin:Namespace. Namespace membership is modeled using the smaoin:belongs_to_namespace property. The thing is, the range of this property is smain:Namespace as one would expect. But how to you specify which namespace? The uid is always possible but not friendly. Normally you would use a namespace:label pair, but… it’s the namespace itself! It doesn’t belong to any namespace!

That’s not all. I did try to solve this problem by making it a member of itself. But it caused a new problem: When you state the namespace is a member of itself, how do you refer to it? You can’t use namespace:label because the namespace is not a member of itself yet, even if it does have a label already. You end up having to use the uid here, which results in the namespace uid being specified three times in the file - unnecessary redundancy and a potential source of errors.

In the example above, I did use the label before it was valid, which is an error, so here’s the fixed text:

	@namespace smaoin <S>
	@namespace metro <M>
	@default metro

	<M>
		smaoin:has_label "metro" ;
		smaoin:has_prefix "metro" ;
		smaoin:belongs_to_namespace <M> ;
		smaoin:is_a smaoin:Namespace .

	<T>
		smaoin:has_label "Term" ;
		smaoin:belongs_to_namespace :metro ;
		smaoin:is_a smaoin:Class .

Namespaces must be available to be referred to somehow, even after being written into a graph. Here are some ideas:

Allow using the namespace declaration even outside of namespace-label pairs, i.e. as a poiner to its uid
Define a special namespace in Smaoin, e.g. call it ‘ns’, to which all namespaces belong
Make namespaces not have any namespace-label reference, and use another way in files to refer to them

Now let’s think. Why are the namespace declarations there in the first place? Answer: It’s easily possible to have two namespaces written by different people which happen to use the exact same prefix. Since the file may be distributed and end up in the hands of these people, the uids behind the namespaces used in the file must be stated explicitly. Generating them can be made very easy using a program which takes a set of namespace prefixes, fetches their uids from the local database and creates an initial empty Turtle-like file which contains the declarations.

Considering this observation, it seems legitimate to use the namespace prefixes declared in the file even as references to the namespace resources themselves. After all, it’s exactly what they’re used for in the namespace-label prefix resolution: Convert prefix to namespace uid and use a query to find the referred resourced. Possible syntax may be to use the prefix as is, or with a special character, e.g. @smaoin or $smaoin or %smaoin.

This approach solves one problem: Each namespace uid ever used in the file needs to be specified at most once, which means there is no redundancy anymore. But the other problem is not solved: There is no way to refer to the namespace later using a namespace-prefix pair. For example, assume we inserted the file contents into the database. Now we want to run a query involving that namespace. For example:

	Give me the labels of all resources in that namespace.

Since there is no namespace-label for the namespace, then assuming its common prefix is “myns”, we would write:

	Give me all ?label
	Where exist ?resource ?namespace
	Such that	?namespace smaoin:has_prefix           "myns" ;
				?resource  smaoin:has_label            ?label ;
				?resource  smaoin:belongs_to_namespace ?namespace .

This is not a problem name-collision-wise, because the same problem would exist if we used the ‘ns’ prefix or a namespace-label where the namespace is a member of itself. But having an easier way to refer to a namespace would definitely be nice. There is actually one case where the method we choose does matter! Look at the Turtle-like text again. In the definition of <T> we have the following line:

	smaoin:belongs_to_namespace :metro ;

The following lines are alternatives according to the suggested solutions:

	smaoin:belongs_to_namespace ns:metro ;
	smaoin:belongs_to_namespace $metro ;

What is the difference? In the first case, :metro is first resolved into metro:metro because metro is the default namespace of the file. Then, since the metro namespace is defined in the file itself, it is found: a resource with label “metro” which belongs_to_namespace with a prefix “metro”. Both happen to be the same resource, but it doesn’t matter for our purpose. In the second case, the namespace ‘ns’ should also be declared, and the metro namespace is defined to be a member of it, which means the resolution works in the same way as the first case. In the third case, the resolution is easier: The namespace uid is used directly. This is actually more powerful, because it allows to specify namespace membership (a common fundamental property) even if it’s chosen not to give the namespace any namespace-label pair (which may be against some conventions, but should be possible).

Assuming the $metro method is allowed then, where should namespaces belong: themselves, or the ‘ns’ namespace? Hmmm… let’s try to find a difference. Assume there are two namespaces with the same prefix. Then ‘ns’ will have two members with the same label - problem. But if the other method is used, there will also be two ways to revolve myns:myns, which is the same problem essentially. Looks like it doesn’t make much difference. Actually both can be used! If I allow resources to belong to more than one namespace, both can work together.

So, if it doesn’t make a difference, which do I want to adopt as a convention? The myns:myns is a bit confusing, while ns:myns sounds a bit centralized. Is it really centralized? No, I don’t think so. All classes are also smaoin:Class - does it make smaoin:Class too centralized than it should be? No, because the whole point of linking information is to use the same tools. Using the ‘ns’ namespace is not more centralized than using the smaoin:Namespace class. Also, it allows work-in-progress namespaces to avoid registering themselves into ‘ns’, thus hiding them from queries. DECISION: Use the $smaoin method and the ‘ns’ namespace.

Here is the same text I gave as an example, now fixed according to the new rules:

	@namespace smaoin <S>
	@namespace ns <N>
	@namespace metro <M>
	@default metro

	$metro
		smaoin:has_label "metro" ;
		smaoin:has_prefix "metro" ;
		smaoin:belongs_to_namespace $ns ;
		smaoin:is_a smaoin:Namespace .

	<T>
		smaoin:has_label "Term" ;
		smaoin:belongs_to_namespace $metro ;
		smaoin:is_a smaoin:Class .

I’d like to discuss one last thing: The namespace hierarchy. Due to the fact namespaces can reside in other namespaces, it’s possible to define convenient syntax for chaining pairs and forming longer references. For example, we can split one large namespace into many smaller ones. It can even be done in a decentralized manner, because the uids are what determines the namespace identity eventually, and not the prefix. Let’s see a possible practical example.

Assume we define an ontology for defining program programs in Smaoin. It’s not necessarily meant for direct writing, but for cross-language modeling, compatibility and conversion. Let’s call this namespace ‘swc’ for “Software Code”. Now assume we want to define the unique components of each programming language separately, based on the generic ‘swc’ tools. We thus define a new parent namespace ‘plc’ for “Programming Language Components” and then define the actual components in sub-namespaces:

plc:c plc:perl plc:python plc:ruby plc:bash plc:cpp plc:asm plc:fortran

And so on. We can even define temporary namespaces like plc:c:c11 for components defined by drafts which aren’t yet stable or final or official. Later these components can be moved into splc:c without changing any of their details - only the objects of the belongs_to_namespace statements.

Defining namespace hierarchies in the ISPO model is easy, but if the Turtle-like syntax doesn’t support them, the usage may be cumbersome: I’d have to declare all sub-namespaces at the top of the file. In order to use them, some new syntax must be added: Allow references of the form ns:ns:ns: … :label and define the resolution: Convert ns0 into a uid using the declarations, and now resolve ns(x):ns(x+1) as follows: The process begins with ns0:ns1 and advances forward recursively. If ns:(x+1) is the last one, it’s actuall the label. Using the uid of ns(x) and the label, find the resource uid and return it. Otherwise, find the uid in the same way and run recursively on ns(x+1):ns(x+2).

It’s a simple linear process. Usually the hierarchies have small depth anyway, so it’s a trivial change.

[See repo JSON]

Clone

Branches

Tags

label-syntax.mdwn