Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

Tags

master :: projects / naya /

design.mdwn

I did some early exploration and design. I put the SPARQL specification’s table of contents on a page, and brought many examples demonstrating all (or most) of the features.

Now I’d like to carefully build the model, adding features step by step. I will be using some minimal notation in the examples, and this notation may later help design a query language syntax.

Hello World

The query model is based on predicate logic. A query is therefore a formula with 0 or more variables, expressing a request to the storage system to fulfill. A minimal “hello world” query would be a… minimal formula.

The existence of variables changes the kind of request:

We will start with a two-section query structure:

  1. Variables
  2. Condition

A minimal no-variable query is an atomic formula, i.e. evaluates by definition to true or to false. Therefore these are the possible minimal non-variable queries:

Query

Variables: -
Condition: true

Query

Variables: -
Condition: false

The minimal queries involving variables are different. Imagine a query with the same condition, but with one or more variables - what does it mean? For example, the following formula:

f(x) = true

Doesn’t make any sense, does it? This formula is actually equivalent to f = true because one is true if and only if the other is true. That’s because no x can affect the result: The value doesn’t depend on the variable.

We could decide the database should just fill in every possible x it knows, but that just sounds like a useless workaround, and doesn’t give any powers not given by conventional formulas. Therefore, here is a rule for formulas with variables: Each variable must have at least one binding in the condition.

A binding means we need to introduce new kinds of conditions, as true and false aren’t enough anymore. The most fundamental kind of condition construct is binary relations. For a relation R and variables x and y, the following formula asks whether they stand in the relation:

f(x, y) = R(x, y)

Therefore a minimal query may look like this:

Query

Variables: x
Condition: ?x R y

The query means: Give me a list of all x which satisfy R(x,y) for the specific given R and y.

Statements

Let’s examine the scope of single-statement conditions. That line you see there, “?x R y”, is a query statement. It doesn’t really state anything; it just tells the database what we want to know. Each query statement has 4 parts:

For our discussion, we will organize them in the following pattern:

(I) S P O

Basically, in the mathematical model, all 4 components must be there. If we don’t care about the value of one of them, we can express that and get just the information we want. Before we get there, here is a query example:

Query

Variables: i, x, y
Conditions: (i) x R y

Relations

The introduction of statement identifiers contradicts what we said about relations: How can they be binary, if they now have three components?

I’d like to introduce a relation layer which will solves the problem. Let’s define the truth relation, T. This relation is the set of all quadruples (I, S, P, O) stated in the database. In other words, it is exactly the content of the database.

Then we have the relations we already saw: A relation R corresponding to some predicate P is the set of all (x, y) such that (i, x, R, y) is stated for some i.

Now each quad (I, S, P, O) corresponds to a potential member of T. The query essentially asks whether it is there or not.

Notations to use: (i)xRy, (i) x R y, (i)R(x, y), R(i, x, y)

Parameters

The query above creates a list of (i, x, y) triples which satisfy the formula. If we need all three values of each triple, great. But what if we don’t care about the statement identifier, and want just the subject and the object, i.e. just x and y? We need a way to say “there exists some i such that” inside the condition, without making i a variable. For example, look at the following formula:

f(x, y) = ∃i(ixRy)

It means “give me all pairs (x, y) such that xRy, regardless of the value of i”. So i just needs to exist - we don’t want its value in the output. But it is not known like R, so we can’t use it like them either. It belongs to a new category.

Let’s extend query structure.

A query example:

Query

Parameters: x, y
Variables: i
Known: R
Condition: (i) x R y

Logical Connectives

One statement is enough sometimes, but non-trivial queries need more than that. They need to connect several statements and say things like “all of these are stated” or “at least one of these is stated”. Several logical connectives exist, and in general a binary connective is possible for each possible truth table, but for simplicity we will use just three connectives here:

I will use two notations here:

  1. Verbal notation: not, or, and
  2. Symbol notation: ~, |, &

These connectives allow combining statements into complex patterns. Example:

Query

Parameters: w, x, y, z
Variables: i, j, k, l
Known: A, P, Q
Condition: (i) A P w
       and (j) w y z
       
           or
           
       not (k) x Q y
       and (l) z P A

Translation

The reason I’ve been using only generic queries as examples is that uids are very ugly. In order to use friendly names, I’d like to introduce two processes.

  1. Digitization: Translation of a namespace-label pair into a uid
  2. Verbalization: Translation of a uid into a matching namespace-label pair

Both can be done on client side or on server side. It allows flexibility and optimization of repeatedly executed queries.

Like Idan files, queries can have namespace declarations used for digitization. It may also be useful to rely just on the names, and let the database match namespace names to their uids. In any case, for simplicity I will specify just a list of the namespaces.

The new query structure is:

In fact, using namespaces and labels in selection queries is just a shortcut. I’ll give an example later.

Functions

A function takes one or more Entity arguments, and optionally also takes the database content too, i.e. the truth relation T. It does some computation on these arguments, and returns a result, usually an Entity. Functions can be used in several ways and have various abstract syntax features related to them. We’ll see them here below.

Operators

Functions which don’t take T as an argument are called operators.

Relation Operators

Just like properties are predicates with content stored in the database, relation operators are predicates baswd on value definitions and aspects, and are computed from the values in real time. They can be used in the same way properties are used.

Relation operators are operators which take two entities as arguments, and return a boolean value.

For example:

Query

Note that statement patters which use relation operators don’t have any corresponding statements, and a statement identifier should not be specified when using them.

Value Operators

These return a single Entity. For example, number arithmetics and string concatenation. May theoretically be used to construct values:

Query

Transformations

Application of a value operator as an output value. The results go into the result table cells, just like regular Entities do. First, simple query:

Query

This gives a list of names of all known named things in the database. Now what if we want to change text encoding or letter case? Let’s use a value operator to make all names uppercase:

Query

So we made “name” a variable, and instead of an explicit parameter we use a result of a value operator application.

Statement Functions

Take the T relation as a parameter and use it. They are essentially wrappers around query features. Example: A function field which takes statement components and returns a tuple of the dependent components or an empty string for components which don’t exist. This function can be used to choose optional table columns. Assume we want to list name, but also age if known. This will list both, and give only things which have both:

Query

Now let’s name ?age optional and not a requirement.

Actually, the function would have to take all the statements containing ?age as a parameter… IDEA: Just add a new query field for optional statements.

Tests

IDEA: allow to add optional statements and have a true/false for them in the output. For optional parameters, tests are just like checking whether the field is empty or not, but they can also do other things. For example, not even use any new variables:

Query

Actually this case is just convenience in a way… we could get the country and see if it’s Africa or not. But if one person could live in many countries, getting a whole list and searching in it would be harder than just asking the database to check it for us.

[See repo JSON]