Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

namespaces.mdwn

Purpose

Discuss and plan the role of namespaces in the expression model.

Content

Namespaces in the expression model have a slightly different role than in XML-based languages.

In XML based languages, namespaces allow using one base URI for all elements inside the namepace, and each element’s URI is the namespace URI with the element’s name appended. This allows referring to something as namespace:element instead of http://www.example.org/namespace/element. However, it means the uids are planned ahead and resources are grouped into namespaces. In other words, it means the uid holds information binding it to a specific ontology.

In our expression model, uids are meant to be meaningless and random, not related to each other in any way. Therefore, a differet mechanism for referring to resources is needed.

Let’s try an example: Assume we want to query a database for all the people who knit. This property of people is documented in a database using a property worksInField whose domain is a class Person and its range is a class FieldOfEndeavor. Therefore, what we want is a list of all resources X which satisfy “X worksInField knitting”. Now assume the following uids:

d63429eda6ddf4d53d8b885f52acadb3 - worksInField
1d7b230d238f32c1b82d966b52acadc3 - knitting

However, this kind of simple query is not enough. What we will get as a result will be a list of uids, and that’s quite useless to us humans. We’d like to be able to recognize the people listed. Let’s instead get a list of the names of all people who knit. We can phrase it in two ways:

Give me all X such that there exists Y where “Y hasName X” and “Y worksInField knitting”
Give me all name(X) such that “X worksInField knitting”

The second version is much simpler and easier to come up with. When we get to planning the query language, we will see that the second version is really the one preferred and recommended to use. It is both easier for humans to use and faster for machines to execute.

So we are going to need another resource: The property hasName.

748b6499e5cbfb5b20e6ab7552acadd9 - hasName

Assume each property can have two names: A predicate name and a function name. In our example, and predicate name is “hasName” as in “She has the name Anne” and the function name is “name” as in name(X). Thus our query, in a human-writable and human-readable form, may look like this:

Give me all: *name (?)*
Which satisfy: *? worksInField knitting*

We have three names of resouces here. What we actually need to refer to is the uids, but obviously a query like this is not practical for humans to write and read:

Give me all: 748b6499e5cbfb5b20e6ab7552acadd9 (?)
Which satisfy: ? d63429eda6ddf4d53d8b885f52acadb3 1d7b230d238f32c1b82d966b52acadc3

The natural solution that comes to mind is “just use the labels of the resources, instead of their uids”. But labels are not guaranteed to be unique: A database many contain a huge number of objects, which many sharing the same label exactly. Also, sometimes a name has more than one meaning, and then the query may become somewhat confusing for other people to understand. The solution: Namespaces.

A namespace is a set of resources whose labels are unique within the namespace. In other words, no label is used by more than one resource in the namespace. This way, inside the namespace, the label can be used to refer to the resource uniquely. It is suggested that namespaces group related resources, e.g. the resources of an ontology, but there is no limitation or restriction on definitions of namespaces.

When a resource used in a query is referred by a namespace-label pair, several things need to happen before the query can be processed:

Using the name of the namespace, find its identifier
Using the identifier, the list of labels is searched and the resource label is found
The uid for that label is retrieved

Therefore not only labels must be unique inside namespaces, but namespace names must be unique too within the context where they are used. It is also possible to refer to a namespace using its identifier, but as with resource uids, it is not a suitable way for humans to communicate.

There following are two of the possible ways to run the steps mentioned above:

From the database itself: Namespaces are resources. They have a uid and a unique label. Then each resource can belong to a namespace through a dedicated property belongsToNamespace. Given a namespace with label N and a resource with label r, the following query can retrieve the uid:

Give me: ?u Such that: ?u belongsToNamespace ?n ?n hasLabel N ?u hasLabel r

Another, perhaps more friendly way to look at this, is the following: Let us first define how we retrieve the namespace identifier, and then use it to find the resource:

define function nid(*N*):
	takes a namespace label *N* and returns the namespace identifier *I*
	Give me ?I
	Such that:	I isA Namespace; I hasLabel *N*
	If found more than one, ERROR
	If found none, ERROR
	If found one, return it

Now we can retrieve the uid of the resource as follows:

define function rid(*r*, *N*):
	takes a resource label *r* and returns the resource identifier *u*
	Give me ?u
	Such that: u belongsToNamespace nid(*N*); u hasLabel *r*

Using tables: Instead of queries, it’s possible to store namespace tables in a form optimized for fast retrieval. The data can still be inside the database, in which case the tables are just auto-generated optimized structures derived from the data in the database.

The same techniques applied to optimized labels can be applied to the database too. These are probably common relational database implementation techniques.

The idea is to use hash maps to find the uid of a given namespace-label pair in average O(1). It can be done either with a single hash map as described, or using a set of maps:

A map for each namespace, which matches labels to identifiers
A map which matches namespace labels to maps of the first kind

In any case, the step of retrieving the uids can be done once, and then an optimized query can be executed many times. Thus the overhead is removed after the initial uid lookup. The tables/database need to be updated as resources are edited, but this shouldn’t add significant delay.

A query has four main forms:

User text: The query exactly as entered by a human user
Prepared: User text query with human-writable names mapped to machine-readable names
Compiled: Prepared query with text parsed into an in-memory query tree
Optimized: Compiled query in a form optimized for efficient execution

As mentioned above, namespaces are not limited to ontologies. The freedom to group resources in namespaces in any desired way makes it possible to define local namespaces and create drafts and experimental ontologies without worrying about name collisions. It allows the user to make mistakes and fix them, and not be afraid to try new things. Namespaces also allow the user to refer to everything by name, which is much more convenient than to use uids.

[See repo JSON]

Clone

Branches

Tags

namespaces.mdwn

Purpose

Content