Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

design4.mdwn

In this page I’m going to explore and suggest a new improved query model, based on [[design2]], [[design3]] and lessons leart while writing the [[examples]].

What is a Query

A query basically consisnts of two parts:

Query pattern: Definition of the output you want to get
Result pattern: How you want to arrange the output components in the output

Query resolution works as follows: The resolver finds values and entities in the database which match the query pattern, and each match it finds is arranged according to the result pattern and appended to the output.

A query pattern may be represented by a logical formula f(p1…pn). The parameters f takes are called query parameters. A single match the resolver finds is a substitution which maps each parameter pi to an Entity. The match must satisfy the formula, i.e. evaluation of the formula for the match must return True. Therefore, we can treat the resolver as a function which takes a formula, and returns a set of matches:

E: The set of all possible entities
F: The set of all possible formulas
s: the solver function

s: F –> 2^(En)

The query results are arranged in a table. All rows must have the same length, but some of the cells are allowed to be empty. The result pattern specifies how table rows are generated from the parameters. The simplest arrangement creates a single new row for each match, and writes values in the cells according to some ordering. We can therefore treat the result arranger as a function which takes a match and returns a set of rows:

E: The set of all possible entities
a: the arranger function

a: E^n –> 2^(Em)

Now it is clearer how the query resolver works: It takes a formula f which represents the query pattern, and applies the solver function s to generate s(f), a set of matches. For each such match m it now applies a to generate a set of rows S, and the union of all the resulting row sets is the table it sends back to the query initiator.

Parameters

The query pattern and the result pattern both use the list of parameters in their definition. Let’s treat it as a set of named parameters P, in which each member is a name string. For clarity, all parameter names begin with ‘$’. For example, $person.

Right now our query q contains just P as a single component, until we define more. The query pattern and the result pattern should be accurately defined before use. We’ll represent it as a tuple like this:

q = (P)

Or in a list like this:

P: a set of parameter names

As we proceed, we’ll add components.

Result Pattern

The result pattern specifies how to construct a set of rows from a given match. It will therefore be a set of tuples, where each tuple represents a row. Cells which contain parameters will be substituted by values from the match, which other (constant) values will appear in the resuling row(s) as is. Explicit empty cells are allowed too. Each member of P must appear at least once in R. All rows must have the same length.

Notations: The result pattern will be denoted with R. Match values are denoted by the parameters they match. Empty cells will be denoted by ‘-’. Values will be themselves, e.g. numbers (48) and strings (“hello”). For readability, uids will be short strings enclosed by ‘<&’ and ‘>’, for example <&Person>.

For example, let’s take P = {$a, $b, $c, $d}. Possible values of R may be:

R = {($a, $b, $c, $d)}
R = {($d, a), (b, $c)}
R = {($b, $d, "hello world"), ($a, -, $c)}

So far we have:

q = (P, R)

Query Pattern

A simple query pattern is a set of conditions. For a candidate substitution to become a match, it must satisfy all the conditions in the pattern, i.e. they are logically connected by AND. There are several types of conditions and complications around them, but we’ll start with the most fundamental and common one: statement patterns. Therefore, for now a query pattern is a set of statement patterns.

A statement pattern is a quadruple. Its elements represent the statement identifier (I), subject (S), predicate (P) and object (O) of a statement. When the elements are substituted by the candidate substitution, the resulting statement determines whether the candidates matches, depending on whether the statement is true in the database.

Each component of the statement pattern is either a parameter, or an entity from E. However, in Smaoin only Resources can be statement identifiers, subjects and predicates, which means using values other than as objects doesn’t make sense. As stated earlier, each member of P must appear in the query pattern, which means in this case that each member of P must be used in at least one statement pattern.

We’ll denote the statement pattern set with S.

So far we have:

q = (P, R, S)

Let’s see an example. From now on, braces are optional for single-element sets and commas are optional as separators (just a space is enough). The query we’ll write means “for each resource which has a name, give me the resource uid, the name and the statement identifier in which the naming is stated”.

P = $i $s $name
R = ($i $s $name)
S = ($i $s <&has-name> $name)

Negation

When statement patterns are used like we just did, each statement pattern represents a truth condition. The pattern matches if the statement is true in the database. But what if want something to not be true? For example, we may want to get a list of all the people who European. We want a candidate to match if “candicate is European” is not true.

Smaoin works under the open world assumption, i.e. something that is not known is not considered false. This is the opposite of how SQL databases usually work, where “not stated” usually means “not true”. Therefore two types of negation are provided:

“Unknown” or “weak negation”: Matches if the attached statement pattern doesn’t match, i.e. the statement is not stated in the database and is not known by inference (if inference is applied)
“Not” or “strong negation”: Matches if the attached statement is false, i.e. other statements contradict it and thus make it false by inrefence

A database which supports inference will usually know more things, since many facts not stated can be deduced by inference. Many facts may also be clearly false, because other facts contradict them. For example, if each person has exactly one name, and <&john> has the name “John Smith”, then the statement “<&john> has-name ‘Jack’” is false. However, databases which don’t do any form of inference may be unable to determine any contradiction, and statement patterns attached to “Not” will never match because nothing is known to be false.

From now on, each statement pattern is part of a pair. The first element of the pair is the matching mode, and the second is the statement pattern. Such a pair is called a statement condition.

What we took as obvious without asking, the “truth” condition, will now have its own notation too. There are three matching modes:

TRUE: tru, +
UNKNOWN: ukn, ~
NOT: not, -

Instead of an explicit pair notation we’ll use the matching mode just before the statement pattern, prepended to the parentheses without whitespace. For example, ($a $b $c d)becomes + (a $b $c d)ortru(a $b $c $d).

Sometimes we want to know whether two specific modes match. For example, we’d like a candidate to match if the statement is not true: either false, or unknown. Instead of having to create two separate conditions with the same statement pattern, I’m introducing two sugar notations:

modes can be combined with |, e.g. -|~ means “false or unknown”
modes can be negated with !, e.g. !+ means “not true”

Of course there is more than one way now to represent the same condition, e.g. !+ is the same as -|~. But it’s useful to express the thinking behind the pattern, when written by human users.

This is not all: We still need to take care of occurence rules. When there was only TRUE, each parameter had to appear at least once, in any of the statement patterns. But what if we have a parameter which appears only in UNKNOWN conditions? From where are the match values taken?

Example: “Give me all the resources whose name is unknown”. The condition could look more or less like this:

~($i $resource <&has-name> $name)

Where is $resource taken from, if it’s never matched in any statement pattern? There is an infinite amount of resources whose name is unknown in the database. Therefore, rule: each parameter must appear at least once in a TRUE or NOT statement pattern. So appearances in UNKNOWN statements don’t count.

Functions

Statement patterns are enough most of the time, but they are limited: They can cause very high memory usage because the information is explicitly is stated in the database, and they only work for finite information (unless inference is used, and then some infinite information is possible). Sometimes, it’s much easier and faster to use a computed function to determine the truth value of a condition, instead of analyzing the database content and searching values and matching patterns.

For example, we may wish to get a list of all the people whose age is at least 20. One condition is “$p is a person". Another condition is "$p has age $a". And finally "$a is greater-or-equal than 20”. How do we express the last condition? The relation ‘>=’ is defined over numbers - and there’s an infinite amount of them. Statements cannot state infinite information. At the same time, comparing two numbers on a computer is a simple fast operation.

Note that we could maybe use inference by defining the relation ‘>=’ recursively, but that only works easily enough with simple functions. When a function becomes complicated, so does the inference. I didn’t actually check and see what happens if a function doesn’t even return a boolean value, or is more complicated than number comparison, but undoubtedly in many cases computation functions are very simple, fast and don’t require inference support.

A function in Naya is a programmed procedure which takes Smaoin entities as arguments, and returns a Smaoin entity. Boolean entities - true and false - play a double role. They serve not only as Smaoin entities, but also as condition matchers, i.e. the boolean value may be used to determine whether a candidate matches the pattern.

Function can be used in two ways:

Compute pattern components (in place of directly specified ones)
Express conditions in the query pattern

In the ‘>=’ example, the function implementing this relation over numbers is used to express a condition. If instead of “age >= 20” we used “age+4 >= 24” we would also have a function computing a component: age+4 would be computed by a function implementing the ‘+’ operator.

[[Design3]] suggests notations for functions. For now, we’ll just use the functional notation f(x, y, z) and sometimes operators like x + y and x = y when it is more readable.

There may be some built-in functions, such as equality of entities, but we’ll see later.

[[TODO|TODO/OPEN]] can functions be used in all 3 matching modes! Does using them in UNKNOWN and NOT make sense? Maybe it requires defining boolean-complementing functions etc.? Maybe function patterns can be used as alternatives to the mode-quad pairs. Also, think about the issue both for function statement and when using function applications in place of entities. First, define what a function statement is.

A function pattern is a new kind of pattern. It expresses a condition just like a statement pattern does. However, at least at the moment function patterns can be used only after all parameters have been bound, in order to supply another condition. They cannot be used for matching them. This may change in the future.

What Did We Have So Far + Pseudo Algorithm For This Subset

[[TODO|TODO/OPEN]]

Existence (i.e. Variables)