Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

concept.mdwn

Purpose

Explain the conceptual and mathematical foundations of semantic data models.

Topics

Introduction
Sets
Built-In Resources

Content

Introduction

A semantic model describes entities in abstract terms of set theory: objects, sets and relations. The world is composed of objects. These objects belong to sets. Objects also relate to other objects through many kinds of relations. For example, assume Anne and John are friends. Then Anne and John are members of the set of all people, and they are related by the marriage relation.

In mathematical notation:

Definitions:

P = {The set of all people}
M = {<p, q> ∈ P×P | p and q are married}

Facts:

John ∈ P
Anne ∈ P
John M Anne
Anne M John

In order to describe things in words, we must have common names for things. The collection of common names in a community of people is called language. The equivalent in the model could be a file or a database containing definitions and details. For the purpose of the discussion, let’s assume it is a database and denote it with D. Also, assume it is not a practical physical database, but an infinite conceptual database containing all the data in the world. In practice, data is spread across many databases.

Sets

In natural language, when we want to talk about specific things, specific concepts and entities, we use nouns. In the model, we use the model equivalents, called Entities. In D, everything is an entity. Objects, sets and relations are entities. D contains a collection of entities and relations between them (which are entities too). We denote the set of all entities in D with the letter E. Thus, in a sense, D defines E.

There are two kinds of entities in the model: Values and Resources. A value doesn’t have a semantic meaning in the database: It can be a number or a series of characters. The contents of a plain-text file can be a value. Boolean numbers 1 and 0 can be values. It is not always clear what can be a value and what is a resource described by several values. In the model, values are represented by sequences of characters. Any non-empty sequence is a value. We denote the set of all possible values (i.e. all values in D) with V.

In practice, due to practical considerations, V is divided into four sets: The boolean values (true and false), the integers (as fixed-point variables), the real numbers (as floating-point numbers) and the strings (all non-empty sequences of characters).

It many seem that these sets actually contain each other, e.g. the real numbers contain the integers - but in the practical model, 1 and 1.0 are not the same entity. They do stand in the mathematical equality relation, but they are different entities. The same for 1 and ‘1’. ‘1’ is a string while 1 is an integer. And the same for true and ‘true’. true is a boolean while ‘true’ is a string. We denote each such subset of V with T, with a subscript letter: B, I, R or S.

The other kind of entity is Resource. Resources refer to unique things in the world, similarly to nouns. For example, D can have resources referring to “a house”, “food”, “the bed on which I sleep”, “the moon”, “a moon”, “Gandhi” (referring to the specific person) and “the name Gandhi”. In order to allow unique references to resources, each resource has a unique identifier string. This string is called Unique Identifier, or UID or uid. Essentially, without any accompanying information to a resource, a resource is nothing more than a uid. The meaning of the resource exists only in our minds: In the computer it is just the uid. But when everyone - users and developers - uses the same uid for the same concept, it is possible to communicate and understand each other. Humans and computers.

The resources are the ones allowing objects, sets and relations to be represented in the model. Let’s denote the set of all resources with R. R can be divided into three dsjoint sets, i.e. there are three kinds of resources. Classes, Properties and Objects. Their definitions are quite simple:

Classes in the model represent sets in reality
Properties in the model represent relations in reality
Objects represent everything else, including abstract things like “love”

The letters we use to denote these sets are C, P and O respectively.

Built-In Resources

In the model, there are several special resources which help define all the other resources. For example, not all relations are between objects: There is also a set-membership relation, whose mathematical symbol is ∈. x∈X means an elements x is a member of the set X. This relation, like any other relation, can be expressed as a property in the model. There is a special property in the model, the “is-a” property, precisely for that.

For properties which have common mathematical notations, we will use a p() syntax to denote their matching properties. For example, the “is-a” relation between a resource and the class it “belongs to” is denoted by p(∈).

Another important relation is the containment relation between sets. The parallel in the model is the subclass/superclass property, denoted by p(⊆). In set theory, relations are sets of pairs, but in the model it is not meaningful, and properties are not treated as sets. Therefore there is a separate containment property for them, subproperty/superproperty, denoted by p(⊑).

In set theory, relations have a Domain and a Range. The domain is denoted Dom(R) and is the set of all resources r such that there exists a resource s where <r,s> is a member of R. In other words, all the resources which exist on the left side of a pair in the relation. The range is denoted Rng(R) and is the same symmetrically for the right side. For example, in the relation “father has a daughter”, let’s denote it fhd, the entities on the left side are all the men who have a daughter, and the entities on the right side are all the women (assuming every woman has a father, i.e. ignoring things like cloning).

Domain and Range exist formally in the model too, but they have slightly different meanings: Each property has a domain and a range defined for it, and they describe who appears on each side of the relation. They don’t need to be the smallest (tightest) possible domain and range. For example, assume p is a property representing the relation “father has a daughter”. Then we use Dom(p) and Rng(p) to denote the domain and range of p. Dom(p) and Rng(p) can both be the class representing the “set of all people”. But we can make them tighter, giving more information, by defining Dom(p) as the class representing “the set of all men” and Rng(p) as the class representing “the set of all women”.

What “giving more information” means, if that if entities e and f are related by property p, we can conclude that e is-a Dom(p) and f is-a Rng(p). The tighter Dom(p) and Rng(p) are, the more information we have about e and f.

A practical database may simply be a container of triples of entities. The second entity must be a property, therefore these are entity-property-entity triples. However, the order matters: For example, Steve may be the father of Jenny, but certainly Jenny can’t be a father of anyone, including Steve. This relation is not symmetric. Other relations may be symmetric (e.g. marriage), but many are not.

Since the concepts used to describe the model and described in terns of the model itself (e.g. the containment relation between classes (sets) is a property inside the database), several triples are derived directly from the definitions of the model.

Classes:

Entity
Resource
Class
Property
Type
Boolean
Integer
Real
Text

Properties:

hasDomain
hasRange
isA
subPropertyOf
subClassOf

[[TODO|TODO/OPEN]] add links to several related Dia diagrams I made

[See repo JSON]