Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

Tags

master :: projects / kiwi / data / wiki / desktop-content /

model.mdwn

More Wiki Fun

I need to choose wiki software. After examining some existing tools briefly, I decided I’m going to create a list of requirements by joining the various listings I have in my research wiki and then maybe try to take existing software and adapt it to my needs.

The primary direction of design is the wiki ontology, which collects information about the required model for a wiki. Even without semantic capabilities, I need to decide how to collect information and which interface to design for my research wiki.

Philosophy

The first and maybe the most important step required before doing more semantic design here, is to stop thinking in terms of files and documents. It doesn’t mean we forget about them - it just means we are ready to live without them if a better model is found.

Now, what is a wiki? Which problem does a wiki solve? Assume a single person is editing the wiki. Then the wiki is essentially an interlinked collection of pages. But what makes a wiki different from a semantic database? What if a wiki is built on top of an ontology? What’s special about it then? Well, first of all it is a collection of documents. A database doesn’t have to contain a single document. What is a document? Is a single paragraph long enough to be a document?

Here it is. The important insight is: In a semantic database, the statements are the data. Even if there’s a string literal and it’s as long as a whole article, it’s still data as much as a simple statement like “Earth is a planet” is data. On the contrary, in a wiki the information is expressed as plain text. Even if there is metadata, possibly a lot of it, the core information is the text itself, and it’s meant for humans to read.

In that sense, there is not much a computer can do with semantic software. I mean, there’s a lot to process and analyze, but from the point of view of users’ needs, a simple network of pages with links is enough. Why and when does it become complicated, then? Answer: At some point we want the computer to understand things:

  1. The meaning of a link
  2. The purpose/metadata of a page

For example, some links mean “this topic has that subtopic” while other links mean “this page is a code implementation of the algorithm listed on that page”. And we want to say things about pages, e.g. a page can be a recipe or an algorithm or a story or an article or an essay or a script or a song. Again, we must stop thinking in terms of files and documents. Each piece of text is just a unit of information, a string literal, and it can be linked to other entities via semantic relations a.k.a properties.

Maybe the most fundamental piece of modeling we can do here, is to allow a page describe something. In other words, it’s like having a describedBy property whose range is the string type. But in practice it is more than that: If the range was really a string, we wouldn’t be able to describe it. For example: when it was edited, where it links, categories it belongs to. And not making it a string allows us to abstract the actual storage.

As to the storage, here is another question: If we use git to store the pages, how can semantic tools reason about the information stored in the version control system? For example, we could model commits and merges and changes and patches and releases and tags using an ontology. Would anything be gained by that? Probably not. Version control has its own specific data model, and there’s no much semantic enhancement to do here. It would just make version control much heavier for large projects like Linux.

These thoughts lead me to something new: Three models for semantic queries.

While making all data models semantic is ideal (because it has unlimited flexibility), in practice it creates package dependencies and heavy computations which cause it to be not-worth-it most of the time. However, whenever it is practical, it can make a lot of changes require just very few new lines of program code, with most of the change happening in the data model and in queries. It is a whole new level of flexibility.

Now, since wikis tend to be stored in some kind of database (SQL or version control), semantic enhancement is not a performance issue. Wikimedia has a semantics extension, and there’s a semantic wiki running, which uses the extension. Using the new developments happening all the time and the insights above, the work left is to research and build an ontology around wikis, documents, files, texts and information.

More Philosophy

If I just used a semantic database, there would be a problem: The wiki is not stored via simple files anymore! That advantage is lost! Bad thing. Now, who said I have to use a database? Take program code for example. Why are programs not expressed as directed graphs and stored as semantic information, instead of the plain text of program syntax?

This leads me to another question: What is information?

From the human perception point of view, it is the expression or presentation of facts/truths about the world around us. What is a fact/truth? Well, it is a description of some part of the universe around us.

From the mathematical model point of view, it is an answer, or a set of answers, to a question/questions of the form “is element x a member of the set X”. Assume for simplicity that each notion you can think of is either a set, or an element which can be a member of sets. Now imagine hypothetically that we could arrange all the sets in the world in a row, and arrange all the elements in the world in a row. Now, how many pieces of information exist in the world? Assuming there are s sets and e elements, then - since we can choose any pair and ask whether the membership relation holds for that pair - there are exactly s · e atomic boolean information units possible, and thus 2 ^ (s · e) different parallel universes are possible.

Let’s go back to semantic databases. What is the database serve then? It is just a form of storage, which happens to be optimized for statement insertions and queries. Other than that, it fills the same role as source code or a YAML file or an SQL database or a piece of paper or a plain text file or an image.

Now wait a second. If the database is optimized for the exact opertions we want to do anyway, why not use it for all our information? E-mail, music medatada, tasks, events, document metadata - it all can be stored into the database and retrieved from it when needed. And indeed for these things it’s fine, but what if we needed to edit infomation? Is it practical to edit Ogg tags by hand? No, and that’s why tagging software exists, or database update requests. A database requires a special mechanism for editing: Something humans can work with efficiently and effectively.

Now, this conclusion explains why programming languages exist: They allow humans to express information. After that, the information can be loaded into a database if needed. Why is program code not loaded? Well, because it does not need to persist. The code is loaded into syntax trees just for the purpose of compilation, and after that there is no need to store the trees, so they are discarded. The same is true for wiki pages: They are a way for humans to express information.

Let’s take an example for semantic enhancement for a wiki and see how to implement it with all our new conclusions and insights. Assume you have a complex model for tasks (which supports dependencies, sending tasks to other people, start dates, due dates, reminders, recurring tasks, categories, location tags, required resources and so on) and you want to create tasks while working on the wiki. Not using database software, because then the task is not visible on the page, but simply by having a line like this:

[ ] Do something

Of course finding all the tasks in the wiki is very easy using GNU text utilities, but in order to allow all those features, they must be combined with the rest of the tasks on your system. Also, care must be taken to avoid creating copies on each computer of each wiki user, or creating a new task every time a task’s content on the wiki page changes. But overall, the idea is to load the task from the wiki to e.g. a database and keep them in sync.

Essenially, this is leading to…. a very interesting direction. Here it goes: Can a wiki page be an Ogg file? Or a LaTeX file? Or a PNG file? Basically, why not, right? Where is this leading us? Our wiki model is becoming a semantic file system.

Wait a second… does it mean the whole idea of interlinked pages is irrelevant? No, of course it is still as useful as it was a minute ago. Actually, it’s quite awesome: Does your file manager support links other than symlinks and containment-in-folder relations? These are pretty weak semantically, aren’t they? Now you can have links inside the files, with meaning attached to each link. And instead of the folder tree or a near-tree, it’s an arbitrary directed graph. Much more powerful than ever before!

Design Rule 0

With all these important insights, we realize there is no strict relation between the data model and which implementation model (A, B or C) is used for it. Wiki pages can be in git or in SQL or in anything else. First a general-purpose basic model must be established.

Wiki Model

The fundamental concept in the model is the blob. We can call it a file instead. A blob is simply a binary sequence, i.e. data. It may contain semantically meaningful data, but it needs to be parsed into an in-memory model at least, before logic can be carried over it.

However, a blob as a blob doesn’t mean anything. It’s just a sequence of 0 and 1. Which kinds of blobs exist? There are many. Text, image, video, audio, executable… Now here are three ways to model these kinds, and then we have to do some modeling research.

  1. One base class and a subclass for each type
  2. One class, and an object for each type and a hasType relation
  3. A hierarchy for types, and each file is both a File and is a content-of-specific-type

I thought about it a lot, and I think the third option is the best one. It’s the most natural and flexible and reusable way to model the types. However, it’s best to read a bit about ontology authoring. Let’s see.

Okay, nothing so far. But I have thoughts of my own, and here they are.

In my expression model, information is the contents of a relation. For example, assume I have a relation R of cardinality n which is a positive integer. Now assume it has been defined that R is a subset of the cartesian product of given n sets, i.e. R ⊆ A₁ ⨯ … ⨯ Aₙ. Under this assumption - let’s call it our world - for each tuple s ∈ Fld(R) one of these it true for us in every given moment:

  1. We know that s ∈ R
  2. We know that s ∉ R
  3. We don’t know (but maybe we can make an educated guess possibly based on probabilities)

Information: Whether s ∈ R is true of false

Knowledge: Whether the current state is one of the first two, or the third

The current expression model is minimal in a sense: It supports just the two smallest cardinalities, 1 and 2. Cardinality 1 is supported by defining classes and objects, and specifying that a given object is a member of a given class. Cardinality 2 is supported by defining arbitrary relations and specifying pairs of objects which stand in these relations.

Any tuple of larger size can be modeled as an object which connects to 3 or more other objects via relations ‘first’, ‘second’, ‘third’ and so on. Direct support for arbitrary cardinalities may be added in the future.

Classes are not just sets. They are sets with a meaning attached to them. Thus I’d like to add:

Design Rule 1

Whenever the information you want to express is that a given object has a given characteristic, model it by attaching the characteristic to a class and make the object be a member of that class via the is-a property.

Wiki Model (Continued)

Let’s apply that rule to our case. Using the type hierarchy (option 2) is not an option because we must use classes. And option 1, i.e. blob subclasses, is a bad idea because it cannot be reused. The winner is therefore option 3. But before I proceed let’s see an example why option 1 is bad. Here is the same model, first using option 2 and then using option 3. You’ll easily see the duplication caused by the first approach:

First approach

class File + class TextFile + class VideoFile + class AudioFile

class DataStream + class TextStream + class VideoStream + class AudioStream

Second approach

class Content + class Text + class Video + class Audio + class File + class DataStream

Now let’s continue. The name “file” could be great for a blob, but it’s confusing because it’s used in the context of file systems. But I don’t like the name blob, because it implies it’s a meaningless block. And “node” is just a graph node, i.e. any entity in the expression model can be a node, so it doesn’t really say anything. Maybe “Data” or “Block”? Or “Array”?

Wait a second - why do we describe the blob itself? When describing a song for example, you describe the song via semantic metadata and simply add a “path” of a “file” property which points to the actual content. Why is the Blob thing needed? Well, the answer is that sometimes, e.g. in Tracker, the path of the file is used as its identifier, i.e. the URI for the RDF resource. Whether it is local or remote, the URI is a way to locate and retrieve the described resource.

In the expression model, giving any meaning to identifiers is wrong, because a reference to a concept is something abstract, and must remain valid even if every aspect of the concept changes. It’s the abstract cognitive awareness of the concept, so it comes before any kind of information and must not mean anything. Therefore, a song may have its own random identifier, and then it may be linked with an actual file, e.g. a local Ogg file in your Music folder.

Now, this is where we come into the picture. That file I mentioned, who said it has to be stored as a file on the file system? It could be stored inside git or an SQL database or it could be stored inside an archive or it could be encrypted and so on. While a URI can point to the location of the blob:

Therefore, instead of having an “address” or “location” properties which are string addresses pointing to files, e.g. occurences of Ogg files of a given song, these “occurences” are abstracted via the Blob class. Of course a blob can have a single URI property, which is a fallback to plain usage of URIs. Nothing old is damaged, while new potential is added.

Let’s give that thing a name. Not necessarily a final name, but a reasonable working name. Ideas:

DECISION: For now, I’ll use Block.

Excellent. Now let’s collect features of existing wikis and build a general-purpose wiki model. I’m starting by saving some content from wikimatrix.org and then I’ll also paste here some text and lists.

Comparison URL: http://www.wikimatrix.org/compare.php?prods[]=60&prods[]=142&prods[]=145&prods[]=151&prods[]=100&prods[]=140&prods[]=167&x=46&y=12

See “linked collected” section for links to the websites of the wikis whose descriptions on wikimatrix.org have been copied here for some inspiration:

Giki is a wiki engine designed to be incredibly easy to set up and use, but with a robust and powerful plugin system allowing it to be as feature-rich as many larger systems.”

Git-Wiki stores the pages in a git repository. Therefore you can use all the benefits of the revision control software (distributed content etc). Multiple markup languages are supported via plugins. The wiki has a very extensible architecture. Custom markup can be added via filters. Additional tags can be added (e.g. the math tag which is provided for LaTeX-rendering).”

"Gitit is a wiki backed by a git or darcs filestore. Pages and uploaded files can be modified either directly via the VCS’s command-line tools or through the wiki’s web interface. Pandoc is used for markup processing, so pages may be written in (extended) markdown, reStructuredText, LaTeX, HTML, or literate Haskell, and exported in ten different formats, including LaTeX, ConTeXt, DocBook, RTF, OpenOffice ODT, and MediaWiki markup.

Notable features include

"ikiwiki provides a general-purpose wiki engine, with particular emphasis on personal wikis, project wikis, blogs, and collaborative software development. We provide several features unique or uncommon amongst wikis:

Sputnik is a content management system designed for extensibility. Out of the box it works as a wiki offering a standard range of wiki features. However, Sputnik can be extended into other things (a ticket system, a mailing list viewer, a forum system). This achieved through a flexible data model with self-describing nodes and prototype inheritance between nodes.”

Now I want to list some features mentioned in wikimatrix and say what I think about them and what my wiki should support.

General Features

System Requirements

Data Storage

Security/Anti-Spam

Development/Support

Common Features

Special Features

Links

Syntax Features

Usability

Statistics

Output

Media and Files

Extras

More Wiki Model

Okay, now I can start creating a software architecture. The idea: create an abstract general model which can later be implemented in any language and on any system, and even make it reasonable for many implementations to exist which are based on that abstract model.

So we had the Block class and the concept of types:

class Content + class Text + class Video + class Audio + class File + class DataStream

And objects can have Block occurences, e.g. a song has an occurence which is a Block that is an Ogg file. Actually it can be the other way around: There’s an Ogg block and it happens to be an instance of a given song.

Here is a suggested class tree to start with:

class Content + class Text + class Video + class Audio + class Executable + class EncodedData + class Block

Now, there are two kinds of Content classes:

Block is of the second kind, the others are of the first.

Next. Let’s think how to model wiki pages. Options:

The first one seems better, but we said earlier than a song has a block. Why is a song not a block? Answer: A song is the words and the musical notes, and the audio file is just a recording made while playing the song. So the song itself is not the block; the recording is. In the case of a page, it is free text and therefore a standalone piece of meaningful content. So a page is a block.

Now we have a rough image of where pages/blocks belong. Let’s say things about them. What do we want to say?

Name. A text file may have the name as its title on the first row, but binary config files, executables, untagged music files etc. do not always have a name, and even when they do it’s not always trivial to fetch from the file in real-time for every query which involves the name. Storing it in metadata/database makes sense.

Hmmm I’ll just list ideas and then I’ll start examining them.

Hmmm that’s all I can think of. I’ll start with that, and then go over the huge feature list from WikiMatrix and discuss them too.

Change history. This will normally be taken from git, so unless git lacks some info like creation time or last edit time, there’s no need to duplicate the info separately. However, having a general model of change tracking just as a data model would be great, because then it’s like an interface that any version control system can implement and work against. In order to model change history I need to read about git internals, so I’ll do it later.

Discussion. In Wikipedia, there’s a separate talk page for each wiki page. Other wikis sometimes have the comments at the bottom of the page. Is it the same? No, because a discussion page can be edited like a wiki page, while a comment is a short message you write once, like a blog comment. The problem with talk pages is that they look like mess, both the source and the rendered HTML, because they regular pages without any discussion semantics. I think the solution is to allow discussion semantics, but not restrict comments to plain text and allow them to use the full power of the wiki pages themselves, e.g. a comment can be a text but can also be a Block, i.e. a formatted page.

What’s special about comments is that they are written by single users. People do not edit other people’s messages. However, it’s possible for talk pages to keep lists / TODOs / references which should be updated and edited by anyone. This can be implemented by allowing arbitrary content to be attached to a page. Let’s model this.

There’s a User class, each comment is posted by a single User. Each such comment can be called Post or Message or Comment etc. but let’s use Message for now. A Message can have content which is either text or a Block. It has a time of creation and a user. A Message can be a reply on one or more messages, e.g. a reply can apply to several posts. This allows the comments to be a DAG and not necessarily a tree. Also, even though this can change later, I’m for now allowing a Message to be a Topic, i.e. a top-level message starting a new discussion topic.

Great, now let’s go over features briefly. No need to do whole research, I just want to build an initial model so I can design a core implementation around it.

License. Each page can specify a license, and segments inside pages can too. But wait a second… there’s a modeling issue again.

Assume there’s a tutorial page, and it contains a source code fragment. How is the content and metadata of the code modeled? Options:

Idea: Make it flexible by allowing segments to be both inside and outside, and allow their metadata be both inside and outside. But now there’s another question: A BlockReference, is it a Block of its own? I mean, does the tree look like this:

Or like this:

Hmmm a Block is not a file, it is just a form or realization, i.e. it is a piece of content expressed as a sequence of boolean values determining membership or non-membership of sets, as described above. Therefore it makes sense for two Blocks to refer to overlapping content, since any sub-sequence of a Block is a Block too. However, it also makes a lot of sense to wish to have a list of standalone blocks which do not overlap, so things like File should exist for that.

At the same time, sometimes having separate blocks is impossible. For example, imagine all your data being in a single huge binary file, and all Blocks on the system overlap with each other and there’s absolutely no strict separation into standalone sections.

Instead of BlockReference, I’m calling it SubBlock.

Now, let’s add ways to specify sub blocks for text: line range, character range, identifier. Hmmm… In order to express the modeling of this, I need to use the intersection of Block and Text, and the intersection of SubBlock and Text. The question: Do I have to define them as separate types? Answer: It looks like OWL allows you to define a new class as a union or an intersection of several existing classes. However, it doesn’t mean a class cannot be specified directly as a union, without defining a whole new class for that, i.e. “anonymous class”. However, such a class can only be used as the object of a statement, never as the subject. Or… no, wait. Why make this limitation? I’ll avoid adding rules for this, for now. Later I’ll read more about OWL.

Decision: Let’s start by using intersections directly, and add them as separate classes only if needed.

Okay, done with sub-blocks. Before I proceed to the next entry in the feature list, I’d like to take care of subtypes of files (e.g. Text can be Source Code or Article or To-do List or Config File and so on) and syntaxes of files (e.g. RDF, C++, Perl, Gitolite config, Apache config, CSV, XML, SVG) and semantics of files (e.g. every SVG file is XML but it has the specific meaning of being an image, by the way it also means an SVG file can be both an Image and Text, right? I need to think about it). Also handle “containment” of encodings and syntaxes, e.g. every ASCII encoded file is also valid UTF-8).

Alright. Take UTF-8 for example. A text file can be Text and Block and hasEncoding UTF-8. However it’s also possible to have a class UTF8EncodedText and make the text file be this and be Block. Which one do I choose?

DESIGN RULE: When a suggested class name described a property of the object, e.g. UTF-8EncodedText or OldPerson, then the class should be replaced by that property, e.g. make Text have an encoding and make Person have age.

So I’m adding an Encoding class. Now, what about subtypes? Here are some subtypes:

Now, this is the tree:

DESIGN RULE: Classes are the most general way to describe traits of objects. For example, it’s not clear which mechanism splits Content into Text, Image, Audio and so on. Therefore, every time the kind of connection can be described, a matching property should be used. Classes are only for cases where the best property you can think of is “hasType”, which is exactly the use case for classes via the isA property.

Let’s start. Here’s an initial subtree for Text:

+ Text + ComputerLanguageText + ProgramCode + CCode + CppCode + HaskellCode

Problem, people. Here are some categories of programming languages:

However, some languages can have more than one, which is not surprising because these things are processes done on the code, and have nothing to do with the text or the syntax itself. It is also not universal: A language may be meant to be compiled, but it’s a separate detail whether such a compiler actually exists and how good it is.

Now, assume I decide to model the “meant to be compiled/interpreted/etc.” thing as a… hmmm… let’s try to do it general. Assume we have a Function class which has the following properties:

But a function is a special kind of Process: A process whose work is to return a result. Now, let’s try to define a tree:

Now let’s define the processes related to some languages:

This allows us to use a Text tree like this:

+ Text
  + ComputerLanguageText
    + CppCode
    + JavaCode
    + PythonCode

The ComputerLanguageText means that the text is meant to be read by a computer. Actually, this can be abstracted further by Functions. Hmmm… maybe not. Let’s think: Assume I have an XML file. Many many things are done with XML files: reading into raster diaplays (SVG), insertion into databases (RDF), filling abstract trees (DOM), abstract parsing (SAX).

Hmmm wait a second. Processable functions operate on Blocks, not on just any Text. And any computer language can be processed using a SAX-like parser with callbacks as an input. Does it mean I need to define such a process for every single language, for the matching Text to be considered ComputerLanguageText? No need, it’s pointless. Let’s just keep ComputerLanguageText.

Since programming languages are meant to be compiled and executed, it may be okay to rely on Functions for them.

The troubles are not over yet. What about executables? There are at least 3 kinds of them:

Now, Executable is a concept, so anything executable should be under it. It means Executable doesn’t have to exist as a self-defining concept: If something can be executed, then deduce it is Executable. For example, assume PythonCode is under ComputerLanguageText. Since there’s a Python interpretation Process, any PythonCode object is Executable.

However this deals just with Text. What about ByteCode and BinaryExecutable? Since scripts are also Text, they can live under Text. But an ELF executable, it’s either a BinaryExecutable or… what? It can be just a Block, but how do you define an ELFExecution function then? What is the input? Any binary file? There are no kinds here? It’s a single elf_execution resource, and it applies to any ELF executable.

The same applies for a ByteCode file, which requires a VM to run.

Hmmm let’s make a suggested updated Content tree then. TODO continue here. after the tree take care of the other things in the previous BOLD paragraph above, add to the Summary and then proceed with the feature list

Update: I wrote a separate file called the-dark-wiki-rises. This file and the mess here and the long Summary section are making me think. How much longer can I work like this before I lose my direction? I need to organize my thoughts somehow. This kind of plain diary is bad. First, let’s move some of the Summary into a Dia diagram…

Okay, I made an initial diagram. Now adding things I did in the-dark-wiki-rises…

Done. Now I need to finish taking care of the bold area from last week. So, how do we model syntaxes? For example, right now I have XML and YAML as subclasses of Text. But this isn’t as good as it can be, because they aren’t just “types of text”. They are specifically languages, which means syntax + semantics.

I’m adding only to the diagram from now on. I’m adding a ComputerLanguage class and objects XML and YAML.

(…after long time…)

Finally, I have an initial working reasonable graph. Things to take care of before I continue with the feature list:

Okay, done for now. Let’s continue with the feature list finally… no, wait. I want to handle the permissions first, at least the basic model. This will allow me to get a file-system overall picture sooner and divide the diagram into ontologies. Let’s read about Linux’s permission model and SELinux’s additions…

http://en.wikipedia.org/wiki/File_system_permissions

Hmmm alright. Let’s start with a general model. In RDF, there is no built-in mechanism of permissions. Also, not surprisingly, Tracker doesn’t implement any permissions and allows software to freely read and write things in the database. But in the whole-new-computing-model, this is impossible. I have to create some permission system. I can use the permission systems or SQL databases (TODO read about them) to create a model for my project here.

On Linux, the file permissions are:

And the directory permissions are:

Permissions are managed with three classes:

I can use such a model directly, but only if it’s capable of expressing every possible permission setting. Maybe I’m missing something, but it looks like the existence of a single group for each file is doing trouble. For example, assume the following required setup:

I’m going to model the permissions on three layers:

  1. General permission vocabulary
  2. Application to Blocks inspired by Linux file permissions
  3. Application to statements inspired by SQL permissions

Now, the concepts are: Action, Permission, User, Group, DigitalDataItem. If a permission is specified, it is granted. If not, it is denied. Of course storing permissions in a semantic database is useful ONLY if permissions also enforced inside the database, otherwise anyone can change Block permissions, and not just the owner as should be.

So wait a second. This permission model applies only to DigitalDataItem, not to semantic databases. I need to define something more general. I’ll use Operation for that. Operation can then have an Action (what to do) and a Target (on what to act).

Hmmm wait a second. How are permissions determined then? For each 〈User, Target〉 pair or a 〈Group, Target〉 pair, it can have hasPermission for a specific Permission, or not. This is what determies ‘deny’ or ‘allow’.

In Linux, the pairs are not arbitrary: Each Target has a User (owner), a Group (group) and optional statements with ownerHasPermission, groupHasPermission and othersHavePermission. It’s also possible to determine permissions using booleans, i.e. instead of hasPermission use a boolean for each user-target-permission triple. The problem: Assuming only relevant users are specified, otherwise it’s super inefficient, for each user-target pair with a relevant user, all permissions must be specified. So either the user-target pair is never found, or all permissions are specified. Now… how do you ensure all permissions are actually specified? You basically need some kind of PermissionSetting which has a Permission, a User/Group, a Target and the boolean.

DECISION: Start with the first, existence-based approach. I admit it may be problematic to enforce permissions here, because while in the second approach changing them means changing booleans, in the first one it may mean adding new statements. Hey, you know what? If the PermissionSettings for the specific user-target pair don’t exist, they do allow changing permissions via creation. Just like the first option. I’m thus taking the first one like I said.

Creating a new diagram for this, called action-permissions…

(…the day after…)

Done. It does need a lot of additions before it can be used in practice, but the abstract model is in place. Finally, I’m going back to the features.

Backends. I have the Block issues settled, but how do I arrange the actual backend interface? Hmmm… I also need to decide whether my system uses filenames or inode numbers. Because if filenames are used, the filename layer must exist. Idea: Allow filenames to be used, but also allow inodes so the wiki can replace the virtual file system. Make it flexible.

Now, how does a storage backend work? Let’s assume a StorageManager class, which is the software, and StorageInstance, which is the specific instance managed by the software. Now, each block has a Key, which is given to the StorageManager in order to get the content from the StorageInstance. However, it is also possible for a block to have Content directly inside the model as a string.

I’ll also need to model interfaces here. But D-Bus already does it in XML, so I’ll examine it first. I’m adding it to the list of tasks in the-dark-wiki-rises. Next.

Authentication. There is a permission model, but nothing was said about determining the identity of the user. For that purpose, the user will have a Key. It may be a password or an RSA key, and it will be used by the Authenticator to determine whether the user is really the user.

Host blocking. What does it mean? Let’s see in wikimatrix.org… here:

host blocking

This can be a server feature, actually, i.e. the webserver can block an IP or hostname from accessing specific pages. How is this usually implemented, i.e. on which level? I’ll check later, hold on. Question: Should this feature be modeled? I’m asking because IPs are a real-time thing, not some information you store for long term usage. Answer: Absolutely, it is stored just like it would otherwise be stored in a configuration file or a plain-text blacklist. Implementation: Have an authenticator check the IP and determine its permissions. Model: Everything can and should be modeled, including network technologies. But I’ll do it later.

By the way, IP-based edit blocking is probably useful only for public sites like Wikipedia, because small teams where members are added manually just won’t give edit rights to random curious people.

TODO proceed to go over the other features briefly and update the model as needed

Summary (not full; it must be combined with the diagram to get a full summary)

class Content

class ContentType:
	superclass Class

class ContentRealization:
	superclass Class

classes Text, Image, Video, Audio, Executable, EncodedData, Block:
	superclass Content

Video, Audio, Executable, EncodedData isA ContentType

Block isA ContentRealization

property name/hasName:
	domain Block
	range text
	cardinality 0/1 (maybe enforce 1, not sure yet)

class User

class Message

class Topic:
	superclass Message

class Reply:
	superclass Message

property content/hasContent:
	domain: Message
	range: text or a Text Block

property author/writtenBy:
	domain: Mesaage
	range: User

property creationTime/createdAt:
	domain: Message
	range: datetime

property title/hasTitle:
	domain: Topic
	range: text

property repliesTo:
	domain: Reply
	range: Message
	cardinality: at least 1

class SubBlock:
	superclass Block

property parent/isContainedIn:
	domain: SubBlock
	range: Block
	cardinality: 1

class License

property license/hasLicense:
	domain: Block
	range: License
	cardinality: unlimited (to allow multi-license)

class LineRangeSubBlock:
	superclass: Text, SubBlock

class CharacterRangeSubBlock:
	superclass: Text, SubBlock

class AnchoredTextSubBlock:
	superclass Text, SubBlock

property beginLine/beginsAtLine, endLine/endsAtLine:
	domain: LineRangeSubBlock
	range: natural number

property beginPosition/beginsAtPosition, endPosition/endsAtPosition:
	domain: CharacterRangeSubBlock
	range: natural number

property anchor/hasAnchor:
	domain: AnchoredTextSubBlock
	range: text

class TextBlock:
	intersection of: Text, Block

class CharacterEncoding

property encoding/hasEncoding:
	domain: TextBlock; or have two domain fields, one for text and one for Block, and interpret them as an intersection
	range: Encoding

References

fill as needed

[See repo JSON]