Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

references.mdwn

Purpose

Plan and design the usage of human-writable names for referring to resources, hiding uid ugliness from the user.

Content

The following query is a typical query written in the SPARQL language, which is a query language for RDF.

[[!format n3 """ PREFIX foaf: http://xmlns.com/foaf/0.1/ SELECT ?name ?email WHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email. } """]]

As you can see, classes and properties - i.e. existing resources - are referred to using the human-readable namespace-label pair discussed previously. But what if we wanted to refer to a resource that is not a class nor a property, i.e. doesn’t belong to an ontology? Assume there is a Person object whose name is John Smith and whose uid is ddefbdc341b2815be15ee60b52bdb5f7.

[[!format n3 """ PREFIX foaf: http://xmlns.com/foaf/0.1/ SELECT ?email WHERE { ddefbdc341b2815be15ee60b52bdb5f7 foaf:mbox ?email. } """]]

A typical response would be: Wait… whaaaat? You’re absolutely right, it’s nowhere close to practical for humans to interpret this query and understand what it does. We need to refer to John Smith somehow. Is the namespace-label scheme a good idea? Probably not, otherwise I wouldn’t write this page :-)

One of the significant differences between ontologies and the data models defined using them, is that ontologies usually have a size manageable by people to understand and to work with, while the data based on them can be huge, even gigabytes and terrabytes of data. As a reason, if namespace-label names are used for everything, eventually the result is a semantically meaningful naming scheme.

Each class would have a label and each ontology concept would have a label, and names would be based on these labels. For example:

Person:Iceland:Reykjavík:JohnDoe

This is not a simple label, but a whole new naming scheme: It contains the class and then a series of property values in a predefined order. In this case it’s the Person class, and the series is:

Country label
City label
Name

If this is not enough to identify a person uniquely in the database, more values would be added. Either way, the simple namespace-label approach is too simple for a terrabyte-large number of resources.

As you may have noticed, the phrase “Person:Iceland:Reykjavík:JohnDoe” can be expressed in a query:

[[!format n3 """ Give me: ?person Where: ?person isA Person ?person livesInCountry [Iceland] ?person livesInCity [Reykjavík] ?person hasName “John Doe” """]]

So the phrase serving as a reference is just a shorter way to express the query. What if it’s not enough? What if the query is going to be used in other databases, and there may be two John Doe there? Ideas:

Find the specific John Doe, and use his uid in the quert
Express the intention in the query, and let the query compiler find the uid
Refer to John Doe by location of the data, e.g. by URL

The first two ideas simply use queries and therefore don’t introduce anything new. But the third idea is new: Refering to things by the location, similar to how URLs are used as unique identifiers in XML and RDF. How could we uniquely refer to our John Doe, so that the reference is always valid? So far only uids can do that, because any property value can be changed (e.g. John changes his name to Robert) and then a reference-by-query becomes invalid.

In the next layer, the network layer, we will examine how data can be distributed by having servers communicate and form a distributed mesh network. Assume they already do. In the communication protocol, each node (computer) has an ID and each graph has an ID. These IDs serve not just for referring to physical locations, but also for referring to shared data, much like a torrent file refers to a file existing in thousands of copies on many computers.

We can try to use the unique IDs used by the protocol to refer uniquely to objects. The reference would allow any network node to retrieve the object’s uid, which is a permanent reference, and always be able to get its information. So, what information do we need in order to refer to a network object uniquely?

To be honest, which I must throughout this wiki, it depends on how the network protocol works, and how IDs are issued and distributed. But an example is given here, which demonstrates a possible scheme. Actually two examples: The first one uses the HTTP URI scheme and the second one uses an imaginary scheme.

https://www.fr33dom.org/people/john-doe
fr33:org/fr33dom/people/r4Gc

Another way to refer to resources can be using the first few characters of their uid. For example, the git version control system allows the user to refer to specific commits by their hash, but it is enough to type the first characters such that the commit can be uniquely identified in the repository’s commit history. In the example above, r4Gc is John Doe’s identification code, which is both unique in its domain and easy to remember.

If the beginning of a hash is used, say 6 first characters, ddefbdc341b2815be15ee60b52bdb5f7 becomes ddefbd, which is much easier to remember. And if “ddefbd is a person” is added to the query, then ddefbd has a good chance to identify John Doe uniquely.

[See repo JSON]

Clone

Branches

Tags

references.mdwn

Purpose

Content