Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>
Clone
HTTPS:
git clone https://vervis.peers.community/repos/yEzqv
SSH:
git clone USERNAME@vervis.peers.community:yEzqv
Branches
Tags
design.mdwn
I did some early exploration and design. I put the SPARQL specification’s table of contents on a page, and brought many examples demonstrating all (or most) of the features.
Now I’d like to carefully build the model, adding features step by step. I will be using some minimal notation in the examples, and this notation may later help design a query language syntax.
Hello World
The query model is based on predicate logic. A query is therefore a formula with 0 or more variables, expressing a request to the storage system to fulfill. A minimal “hello world” query would be a… minimal formula.
The existence of variables changes the kind of request:
- If there are no variables, return a boolean: Is the formula true for the target database or not.
- If there are variables X1…Xn, return a list of tuples S1…Sn which satisfy the formula when the variables are substituted by them.
We will start with a two-section query structure:
- Variables
- Condition
A minimal no-variable query is an atomic formula, i.e. evaluates by definition to true or to false. Therefore these are the possible minimal non-variable queries:
Query
Variables: -
Condition: true
Query
Variables: -
Condition: false
The minimal queries involving variables are different. Imagine a query with the same condition, but with one or more variables - what does it mean? For example, the following formula:
f(x) = true
Doesn’t make any sense, does it? This formula is actually equivalent to f = true
because one is true if and only if the other is true. That’s because no x can affect the result: The value doesn’t depend on the variable.
We could decide the database should just fill in every possible x it knows, but that just sounds like a useless workaround, and doesn’t give any powers not given by conventional formulas. Therefore, here is a rule for formulas with variables: Each variable must have at least one binding in the condition.
A binding means we need to introduce new kinds of conditions, as true and false aren’t enough anymore. The most fundamental kind of condition construct is binary relations. For a relation R and variables x and y, the following formula asks whether they stand in the relation:
f(x, y) = R(x, y)
Therefore a minimal query may look like this:
Query
Variables: x
Condition: ?x R y
The query means: Give me a list of all x which satisfy R(x,y) for the specific given R and y.
Statements
Let’s examine the scope of single-statement conditions. That line you see there, “?x R y”, is a query statement. It doesn’t really state anything; it just tells the database what we want to know. Each query statement has 4 parts:
- Identifier (I)
- Subject (S)
- Predicate (P)
- Object (O)
For our discussion, we will organize them in the following pattern:
(I) S P O
Basically, in the mathematical model, all 4 components must be there. If we don’t care about the value of one of them, we can express that and get just the information we want. Before we get there, here is a query example:
Query
Variables: i, x, y
Conditions: (i) x R y
Relations
The introduction of statement identifiers contradicts what we said about relations: How can they be binary, if they now have three components?
I’d like to introduce a relation layer which will solves the problem. Let’s define the truth relation, T. This relation is the set of all quadruples (I, S, P, O) stated in the database. In other words, it is exactly the content of the database.
Then we have the relations we already saw: A relation R corresponding to some predicate P is the set of all (x, y) such that (i, x, R, y) is stated for some i.
Now each quad (I, S, P, O) corresponds to a potential member of T. The query essentially asks whether it is there or not.
Notations to use: (i)xRy, (i) x R y, (i)R(x, y), R(i, x, y)
Parameters
The query above creates a list of (i, x, y) triples which satisfy the formula. If we need all three values of each triple, great. But what if we don’t care about the statement identifier, and want just the subject and the object, i.e. just x and y? We need a way to say “there exists some i such that” inside the condition, without making i a variable. For example, look at the following formula:
f(x, y) = ∃i(ixRy)
It means “give me all pairs (x, y) such that xRy, regardless of the value of i”. So i just needs to exist - we don’t want its value in the output. But it is not known like R, so we can’t use it like them either. It belongs to a new category.
Let’s extend query structure.
- Parameters: The values we want to match and see in the tuples
- Variables: They should exist but not appear in the tuples
- Known: Specific values we use directly in the query
- Condition: The statement pattern
A query example:
Query
Parameters: x, y
Variables: i
Known: R
Condition: (i) x R y
Logical Connectives
One statement is enough sometimes, but non-trivial queries need more than that. They need to connect several statements and say things like “all of these are stated” or “at least one of these is stated”. Several logical connectives exist, and in general a binary connective is possible for each possible truth table, but for simplicity we will use just three connectives here:
- Not - the statement is not stated
- And - several statements are all stated
- Or - at least one of several statements is stated
I will use two notations here:
- Verbal notation: not, or, and
- Symbol notation: ~, |, &
These connectives allow combining statements into complex patterns. Example:
Query
Parameters: w, x, y, z
Variables: i, j, k, l
Known: A, P, Q
Condition: (i) A P w
and (j) w y z
or
not (k) x Q y
and (l) z P A
Translation
The reason I’ve been using only generic queries as examples is that uids are very ugly. In order to use friendly names, I’d like to introduce two processes.
- Digitization: Translation of a namespace-label pair into a uid
- Verbalization: Translation of a uid into a matching namespace-label pair
Both can be done on client side or on server side. It allows flexibility and optimization of repeatedly executed queries.
Like Idan files, queries can have namespace declarations used for digitization. It may also be useful to rely just on the names, and let the database match namespace names to their uids. In any case, for simplicity I will specify just a list of the namespaces.
The new query structure is:
- Namespaces: The namespace prefixes used in the query
- Parameters: The values we want to match and see in the tuples
- Variables: They should exist but not appear in the tuples
- Known: Specific values we use directly in the query
- Condition: The statement pattern
In fact, using namespaces and labels in selection queries is just a shortcut. I’ll give an example later.
Functions
A function takes one or more Entity arguments, and optionally also takes the database content too, i.e. the truth relation T. It does some computation on these arguments, and returns a result, usually an Entity. Functions can be used in several ways and have various abstract syntax features related to them. We’ll see them here below.
Operators
Functions which don’t take T as an argument are called operators.
Relation Operators
Just like properties are predicates with content stored in the database, relation operators are predicates baswd on value definitions and aspects, and are computed from the values in real time. They can be used in the same way properties are used.
Relation operators are operators which take two entities as arguments, and return a boolean value.
For example:
Query
- Give me a list of people older than twenty
- ?age relop:greater_than 20
- ?age relop:gt 20
- ?age > 20
Note that statement patters which use relation operators don’t have any corresponding statements, and a statement identifier should not be specified when using them.
Value Operators
These return a single Entity. For example, number arithmetics and string concatenation. May theoretically be used to construct values:
Query
- ?age+5 > ?weight
- ?me has_name concat(“John”, “Doe”)
Transformations
Application of a value operator as an output value. The results go into the result table cells, just like regular Entities do. First, simple query:
Query
- Parameters: name
- Variables: thing, i
- Known: has_name
- Condition: (i) thing has_name ?name
This gives a list of names of all known named things in the database. Now what if we want to change text encoding or letter case? Let’s use a value operator to make all names uppercase:
Query
- Parameters: to_uppercase(name)
- Variables: thing, i, name
- Known: has_name
- Condition: (i) thing has_name ?name
So we made “name” a variable, and instead of an explicit parameter we use a result of a value operator application.
Statement Functions
Take the T relation as a parameter and use it. They are essentially wrappers around query features. Example: A function field which takes statement components and returns a tuple of the dependent components or an empty string for components which don’t exist. This function can be used to choose optional table columns. Assume we want to list name, but also age if known. This will list both, and give only things which have both:
Query
- Parameters: name, age
- Variables: thing, i, j
- Known: has-name, has-age
- Condition: (i) thing has-name ?name; (j) thing has-age ?age
Now let’s name ?age optional and not a requirement.
Actually, the function would have to take all the statements containing ?age as a parameter… IDEA: Just add a new query field for optional statements.
Tests
IDEA: allow to add optional statements and have a true/false for them in the output. For optional parameters, tests are just like checking whether the field is empty or not, but they can also do other things. For example, not even use any new variables:
Query
- Parameters:
- Required: person
- Optional:
- Variables: i, j
- Known: lives-in, is-a, Person, Africa
- Condition: (i) ?person is-a Person
- Test: (j) ?person lives-in Africa
Actually this case is just convenience in a way… we could get the country and see if it’s Africa or not. But if one person could live in many countries, getting a whole list and searching in it would be harder than just asking the database to check it for us.