Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

Tags

master :: projects / naya /

design3.mdwn

CONSTRUCT queries

Basically, these queries are different in two ways, unless I’m missing something:

  1. They allow to produce more than 1 row per match
  2. They produce only valid RDF triples

Here’s an example from the SPARQL official doc:

[[!format n3 """ PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX vcard: http://www.w3.org/2001/vcard-rdf/3.0# CONSTRUCT { http://example.org/person#Alice vcard:FN ?name } WHERE { ?x foaf:name ?name } """]]

Since it produces just a single triple per match, the query model we have so far is able to do the same thing by having 3 parameters in P. But what if we wanted more than that? For example, assume we want a list of name-email pairs of all the people known to the system. That’s a single pair per match. What if, instead we want to have two triples per match:

This requires that we change the structure of P! Instead of being a list, it should be a set of lists. In addition, P can from now on include Entities, i.e. constant parts inserted into rows. Examples:

This update should solve 2.6 in [[examples]].

Filters

So far, there are two ways to include computational conditions in the query:

  1. Relation operators
  2. Property/operator between computed values, using value operators

This is somewhat limited: What if I have a relation operator which is not binary? The only way to implement it right now is to have a value operator which returns a boolean, and compare the result of its applciation to ‘true’. Not very elegant. Let’s introduce an extension to relation operators.

In general, operators take a list of Entities and return an Entity. But operators returning a boolean are special: They can be used as conditions in a query. This includes:

Now, here’s the notation for them. Some rules exist already, some are new:

This is not final syntax. I can also use things like f: x y z etc. for syntax, but it’s early for that. What I need for now is just simple formal notations for the model plans, and later I’ll think about actual practical syntax. Another idea for later: Write high-arity relations like sentences, e.g “$Person uses $tool at $time with $object for $purpose”. It’s basically like naming each component of the tuple, which helps make things clearer than when just listing arguments like in functional notation. It’s also like name-based arguments in programming languages.

This update helps solve 3.1 in [[examples]].

Optional Parameters

When P was just a list of labels, having optional parameters in a separate list A worked well. But since then - see above - P changed, and is now a set of such lists. Since each list may contain optional parameters, it is no longer sufficient to have a separate A.

However, since the notations P and A were there for convenient modeling in the first place, I can still keep them for the same purpose. Instead of removing or changing, I will add something. P is back to being a list. The set-of-lists structure remains, but it is moving into a whole new section of the query: The result pattern. Let’s denote it with R. The result pattern is exactly what P just was, but it can contain only parameters from P and from A.

This update helps solve 6.1 in [[examples]].

Optional Parameters in Unions

The problem arises when translating the following query to Naya:

[[!format n3 """ PREFIX dc10: http://purl.org/dc/elements/1.0/ PREFIX dc11: http://purl.org/dc/elements/1.1/ SELECT ?x ?y WHERE { { ?book dc10:title ?x } UNION { ?book dc11:title ?y } } """]]

It means each row has a value for the match, and an empty cell for the other value. Actually the SPARQL spec doesn’t explain there what happens if both are matched - do they get a line specifying both x and y - but let’s just try to do our own modeling here.

I think I had a rule somewhere, stating that for each OR section, all non-optional parameters must be used at least once. Otherwise, there’s no condition on the argument, i.e. it’s like defining a function f(x) which doesn’t even use x. It doesn’t make sense for queries. Let’s look at the following formula:

f(x) = Exists z such that (xRz or xSz)

We already defined OR as a union of the results of the queries for the individual sections. This works perfectly for our f here, but it doesn’t work for the SPARQL query because not all parameters are used in all OR sections. The query basically means: If you don’t match a parameter, just leave it blank. Let’s try writing a formula:

g(x, y) = Exists b such that (bRx or bSy)

Assume we found an x for which a b exists such that bRx. Now, which y values would satisfy g? Answer: Any value we take. I means the query solution should include the x we found with every possible value of y, i.e. every value and every resource. But there’s an infinite amout of them (just the natural numbers alone are infinite in quantity), which makes it impossible and useless in practice. What we really want here is an indication which side was matched, but that can be acheived by adding boolean computed parameters or Tests or something like that.

Before we do that, let’s try to accomplish the same result the query requests: Both columns are optional. Hmmm… no, it still can be easily done by having two separate queries. Maybe… let’s try to have queries where all parameters are optional and see where it leads us.

Here both parts are required: Find name-mbox pairs and make each one a row in the result table. If we had one of them as optional, the other could be matched first, and then either matches are found for the optional parameter, or none are found and the row is taken with an empty cell. Now, what if both are optional? In the example above, what if we want to have name and mbox of all people, and for each person list anything available - mbox and/or name?

Makes sense, but there’s still an open question: What if a person has none? If we wanted to list all the people, even if no info about them exists, we would all make ?x a parameter and state that it is a Person. Since we didn’t, we want just the details of the people who have a name, an mbox or both. In other words: Take rows where at least one optional parameter is matched. It would look like this:

Does the AND mean anything in this case? What if it was OR instead? Before I answer that, let’s write a Naya version of the original SPARQL query. I’ll try using a pair of optional parameters.

[[!format n3 """ PREFIX dc10: http://purl.org/dc/elements/1.0/ PREFIX dc11: http://purl.org/dc/elements/1.1/ SELECT ?x ?y WHERE { { ?book dc10:title ?x } UNION { ?book dc11:title ?y } } """]]

And finally I can make a reasonable definition of how AND and OR work in such cases. AND: If one matches and the other quesn’t, make a blank cell. If both match, put them on the same row. OR: Every row has a blank cell, because only one parameter can be matched at a time. But these rules may not be correct for different uses of statement identifiers.

Why does it work like this? Look at the Naya query above. It’s impossible for the same i to match both cases, because it’s a statement identifier. So exactly one section matched every time.

Let’s play a bit… the same query again, but now with OR and both sides can be matched:

Each statement says something about the other. Strange but possible. Oh, wait… the rule says every non-optional parameter has to appear in each OR section. But what about optional ones? Does it make sense to have an unmatched optional parameter? If yes, what happens? Example:

It means: Find all x such that either xPw and optionally y such that xQy, or xPz. I think the decision how missing optional parameters are treated is arbitrary: We can either forbid them, or decide they just cause “no match”, i.e. an empty cell. I’ll decide later, since it seems arbitrary at the moment. Let’s go back the the previous query. If a missing optional parameter means no match, the OR now has its own meaning: Exactly the one I chose. Each row has just %x or just %y, with the other cell being blank.

This update helps solve 7 in [[examples]].

Negated Existence

I already decided that SPARQL’s EXISTS clauses are unnecessary here because variables cover their scope. But wasn’t I too quick to decide? Let’s take a query to work with.

[[!format n3 """ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX foaf: http://xmlns.com/foaf/0.1/

SELECT ?person WHERE { ?person rdf:type foaf:Person . FILTER EXISTS { ?person foaf:name ?name } } """]]

Direct translation to formula:

f(x) = xRy and exists z such that xSz

Moving the existence to top-level:

f(x) = Exists z such that (xRy and xSz)

Suggested query:

Now comes the twist… negation!

[[!format n3 """ PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX foaf: http://xmlns.com/foaf/0.1/

SELECT ?person WHERE { ?person rdf:type foaf:Person . FILTER NOT EXISTS { ?person foaf:name ?name } } """]]

The problem is that every query has a top-level “Exists”, and there’s no way to negate it. Possible solutions:

I have a feeling the second one is better. But let me prove it first. Assume you have a query with a top-level OR. Each OR section is an AND. One AND contains 2 statements, while the other contains 3. It means one of them needs 3 variables for statement identifiers, while the other needs just 2! What do we do with the extra variable! It’s clutter!

Yes, there’s a good point here… since at the moment there is no “for all”, it should be possible to place “exists” and “not exists” before any statement or block. But that’s more or less how SPARQL works: It places variables where it has too, and binds occurences by name, e.g. if two blocks use the same name, the “exists” theoretically goes up to the surrounding block.

For now, I think a new component is good enough. Is it good enough to have a top-level non-existence variable? Let’s try by adding a section W for these variables, and prefix them with ^.

Hmmm there’s another problem. Look at the second statement pattern. Now, here are two possible interpretations:

  1. Exists ?j such that not exists any ^name for which ?j-$person-foaf:name-^name
  2. Not exists any ^name for which exists ?j such that ?j-$person-foaf:name-^name

What we meant is the second one. But what if we wanted the first? How would we express it? SOLUTION: Put the variables and the negated variables inside T, and not as top-level V and W sections. IDEA: Maybe just treat them as “there exists X and not exists Y such that…” and then there is no ordering and maybe top-level variables still work.

Query Compilation

IDEA: Move some pressure from server to client, by sending the server a compiled query in the form of a tree, instead of making the server do all the work.

[See repo JSON]