Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

theory.mdwn

This page explains the theoretical concepts in which [[Smaoin]] has roots. The content is written in a simple accessible way especially for Smaoin, and is therefore not necessarily a good general-purpose intro to these topics, because it omits irrelevant and less important things. For more info you can read in [[!wikipedia Set_theory desc=Wikipedia]].

[[!toc]]

Motivation

Assume we’d like to represent information somehow in computers. Create a “model” that both humans and computers can understand. A model that all applications use, so computers and people can e.g. read and edit notes without installing the specific note taking application they were created with (although they can, to get a nice GUI) or read the content of their e-mail inbox without installing a specific e-mail client (but using one would provide a nice GUI). Make the information separate from the functionality. Applications then become just user interfaces, and data is a separate layer they much less worry about. You can easily switch between e-mail clients and you can send a note you wrote to a friend, who will be able to read it with whatever note taking application they use, or even without having one at all.

But what is information? Clearly we have bits and bytes which can encode any data, but how do we actually express useful things with them? We have numbers, text, symbols, ideas, images, music, animation, graphs, tables… what do they all have in common?

Information is already encoded in computers, of course. These very words are information, in plain text form. But that means computers don’t easily understand what I’m writing here. They just see bytes and characters and lines, but not meaning. How does software usually add meaning? Ideas:

Special file formats (e.g. PNG for images)
Markup languages (e.g. XML, JSON) with schemas (e.g. RSS, XMPP, XHTML)
Databases with schemas (e.g. SQL)

There are so many formats and schemas, but something is still missing. Formats are often made for the specific use that motivated their creation. There is no integration between them, and conversion between formats requires dedicated software to be written. They use structures that make their usage fast, at the cost of incompatibility with other structures and other formats.

Information

What if there could be some “one format to unify them all”? A general purpose model, that even if not the most efficient and convenient for a specific use, can help bridge over all the differences and provide integration?

Here’s a suggested simple model for representing information. This is the idea RDF was (and is) based on, and Smaoin too is based on.

Suggestion: Every piece of information can be represented as a relation between some entities. Examples:

“Anne loves John.”

Entities: Anne, John
Relation: Loves

“The bus is green.”

Entities: Bus, Green
Relation: Has-Color

“Patrick the cat is very cute”.

Entities: Patrick the cat, the group of all very cute beings
Relation: Belongs-To-Group

These are very simple sentences, of course, and more complex sentences can’t be represented like this. But we can combine several of these entities+relation entries to express bigger things.

Elements and Sets

Assume there are two kinds of things in the world, elements and sets:

Sets are groups of elements.
Elements are individual items that can be grouped into sets.

Now we can simply even more our definition of information. Suggestion: any piece of information can be expressed as an answer to a question of the form “is element E a member of set S”. This is a yes/no question, whose answer can be represented in a single binary digit (bit), and many such answers can be combined into complex expressions of information.

Elements are generally denoted by lowercase letters, e.g. a, b, c, x, y, z. Sets are generally denoted by uppercase letters, e.g. A, B, C, X, Y, Z.

In order to denote that an element belongs to a set, we’ll use a relation called “is member of”. For example, 5 is-member-of Numbers and Joe is-member-of People. There is also the “is not member of” relation. We’ll denote them using symbols as follows:

∈ means is-member-of
∉ means is-not-member-of

Examples:

x ∈ X
a ∉ A
5 ∈ Numbers
Joe ∈ People
5 ∉ People
Joe ∉ Numbers

With this simple model in mind, we’d like to tackle the task: how to organize elements, sets and yes/no answers in the computer? We know we can use bits for the yes/no answers, but what about elements and sets? We could refer to elements by their name, but how do we encode definitions of sets?

First, let’s define some sets:

A: A set of friends: Alice, Bob and Cindy
B: The set of all the letters in the English alphabet
C: The set of all the numbers: 0, -1, 9999, square root of 2…
D: The set of all the blue houses in the world
E: A set containing the previous 4 sets as its elements

There are several interesting things we can see here:

We can define sets by saying who/what the members are
We can define sets by specifying a condition or trait that all members possess
Sets can be finite (e.g. alphabet)
Sets can be infinite (e.g. numbers)
Sets can have members that are themselves sets, i.e. sets can be members

Since we now have 2 ways to define sets, let’s use symbols to denote them conveniently.

For the first way, called extensional definition, we’ll list the members separated by commas and surround the list with curly brackets. Examples:

A = {Alice, Bob, Cindy}
B = {‘a’, ‘b’, ‘c’, … ‘z’}
E = {A, B, C, D}

For the second way, called intensional definition, we’ll use curly brackets too. Inside, there will be two sections, separated by : or |. The first section is a template, and the second section describes how to generate memebers from the template, e.g. by specifying a condition or a trait. Examples:

C = {n | n is a number}
D = {h | h is a blue house}
X = {x + y | 0 < x < 5 and 6 < y < 9}

Great! Now, how do we encode information with these tools? Let’s try.

“Anne loves John.”

Set: People who love John
Element: Anne

Another option:

Set: People who are loved by Anne
Element: John

Also this:

Set: entity-relation-entity triples
Element: Anne-Loves-John

And this:

Set: x-y pairs, where x and y are people and x loves y
Element: Anne-John

All of them basically make sense. Is any one them the “right” one“? Is any of them better than another? Well, we can start by looking at the first two options. Both of them are very specific to the information we want to encode, and probably aren’t good for a general purpose model. We’d like to be able to refer to”love" without specifying specific people.

Options 3 and 4 sound somewhat general purpose. But both of them define their set using pairs or triples of elements, which we haven’t seen yet. Let’s define these things first.

Pairs and Tuples

In sets there is no order between the elements. For example:

P = The set of all the people in the world = {p | p is a person}

Does any specific person come “before” or “after” any other person? If you order them by name alphabetically then yes, otherwise no. The same is true for numbers: They have an order mathematically, but as far as the set is concerned, there is no order. Therefore:

{1, 2} = {2, 1}
{‘a’, ‘b’, ‘c’, ‘d’} = {‘c’, ‘b’, ‘d’, ‘a’}

Sometimes, like with numbers, we do want to express order. For example, “Anne loves John” and “John loves Anne” are not the same thing (although I hope both are true). So the order matters, i.e. whether Anne is first and John is second, or John is first and Anne is second.

Let’s introduce order, then. We’ll start with pairs.

A pair is… simply a pair or elements. One is the first element and the other is the second element. The order matters, so the pair Anne-John and the pair John-Anne are different pairs, while {Anne, John} and {John, Anne} are the same set.

Pairs are denoted using triangle brackets. For example:

<Anne, John>
<John, Anne>
<x, y>

The same element can be used for both sides of the pair, and sets can be pair elements too:

<x, x>
<A, B>
<X, X>

Pairs can be pair elements too:

<<x, y>, z>
<Anne, <John, Tom>>
<<Alice, Bob>, <Cindy, Dan>>
<<<a,b>, <c, d>>, <<e, f>, <g, h>>>

Pairs can be set members:

<1, 2> ∈ {<x, y> | x and y are numbers}

Now let’s extend the pair concept into arbitrary length. What if, instead of 2 elements, we put 3? Like this:

<x, y, z>

Pairs are a special case of a tuple. A tuple is a series of elments with ordered positions. A tuple of length 2 is a pair. Any lengths are possible:

<a, b>
<a, b, c>
<a, b, c, d>
<a, b, c, d, e>
<1, 2, 3, ... 1000>

Like pairs, tuples can be set members, pair and tuple members and so on.

Relations

Finally we can approach the last tool we need for our information model. We already saw we can just use pairs and tuples, like this:

“Anne loves John.”

Set: <x, y> where x and y are people and x loves y
Member: <Anne, John>

This is how relations work. They aren’t a new building block: A relation is a set of tuples of the same length. Let’s assume for now that these tuples are of length 2, i.e. assume a relation is a set of pairs.

In other words, our Set+Member representation above actually uses a relation! It defines a relation using an intensional definition, and then specifies a pair of elements which is a member of the relation (remember a relation is a set).

Instead of “is member of” between pairs and relations, we’ll use “holds for”. If a pair <x, y> is a member of relation R, we say “R holds for x and y”. Symbolic notations:

<x, y> ∈ R
or
xRy

The second notation may seem strange but it makes sense with relation symbols. For example, the relation “is greater than”. “5 is greater than 3” will be denoted as:

<5, 3> ∈ >
or
5 > 3

Relations which contains pairs are called binary relations. The length of the member tuples, in this case 2, is called the arity of the relation. Larger arities are possible, e.g. a relation can be a set of triples (3) or a set of quadruples (4) and so on.

Information Revisited

Let’s go back to our options 3 and 4 we had above, which use pairs:

“Anne loves John.”

Set: entity-relation-entity triples
Element: Anne-Loves-John

Or:

Set: x-y pairs, where x and y are people and x loves y
Element: Anne-John

As you can see, they both use relations and are basically equivalent. The difference is that the first case has a single set for all information, while the second case has a separate set for each relation. Relational databases (SQL) use the second option: Each table represents a relation. Semantic databases (RDF, Smaoin) use the first option: There is a single table containing element-relation-element triples.

TODO: Queries

[[!template id=todo text=“Explain theory of propositional and predicate calculus”]]

[See repo JSON]

Clone

Branches

Tags

theory.mdwn

Motivation

Information

Elements and Sets

Pairs and Tuples

Relations

Information Revisited

TODO: Queries