Home → Repo ^yEzqv → Branch master Files → projects → kiwi → data → wiki → desktop-content → tasks-and-ideas.mdwn

Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

tasks-and-ideas.mdwn

Content
Block
Text
ComputerLanguageText (can be deduced, i.e. if a Content hasSyntax)
ProgramCode (can be deduced as discussed) + Script (can be deduced, i.e. this is the intersection of ProgramCode and Executable)
- PythonCode
Image
RasterImage
VectorGraphic
Audio
SoundWave
SpectralComposition
Video
Executable
ByteCode
Binary

NOTES

Interfaces

Model APIs, interfaces and implementations, especially so I can implement the storage backend interface. First read about D-Bus and learn from the experience gained while implementing it, and then create my own model. TODO

Name

Since this is just the data model for now, re-defining everything about computing, I want the name to mean “genesis”. Start here: https://en.wiktionary.org/wiki/beginning done: the name suggested right now is Kiwi

Permissions

Read about Linux file permissions, understand how it work, strengths, weaknesses, so I can add a universal extensible model to my wiki model. TODO

Versions

Allow the document/file version vocabulary to specify draft and release versions TODO

Data

It may be good to have a Data class under Content, for example for XML and YAML files containing data structures. It makes sense because XML is just a textual representation, while data can also be binary-encoded. So an XML is always Text but it’s also something else: For example it can encode a process or a document (still text of course) or an SVG image and so on. TODO

Representation, Encoding and Syntax

The model is !!!flawed!!!. Take a LaTex file which represents an article. Both are text, so you put your object under Text. But then do you make it an Article? If you do, how does one tell whether the source is an article, or a generated form is an article? And how is it generated? IDEA: Use Content for the “generated” form, i.e. the semantics. For example, SVG in the Content sense is just an image.

Then you can add information about the actual syntax of the file and whether the data is specified in a textual form.

In other word, Content specifies which resources you have, e.g. a PDF and its LaTeX source both represent an article. It’s true the LaTeX is just a source, but in practice even viewing a PDF means rendering, so both require computation anyway.

I have 2 ideas:

Use the Content tree to represent mixed concepts, e.g. a LaTeX file is both plain-text and a LaTeX file and a book/article/document, and if all of them map to a file being Text that’s okay. It just means you’ll find it under your documents and under your plaintext and under your articles. Excellent. Then, allow block to represent something and to be encoded as something. For example, a LaTeX file is encoded as text, has LaTeX syntax and repesents e.g. a book.
Use Content as a deduced class, i.e. what you actually define is a Block and specify the encoding and what the Block represents and which processes transform it into other forms. Then it is deduced that a plaintext document is Text, and that a PDF file - even though PDF is binary-encoded - is Text, and so on. Actually, FIX: PDF is not always Text, e.g. a PDF can be just an image with no text at all. THINK ABOUT IT.

Okay, let’s try to model this better. A file has two things: Semantics and syntax. Semantics is what it represents (e.g. an HTML file represents a webpage), and syntax is how it expresses the represented data (e.g. HTML text).

However, note that LaTeX is general-purpose so it can be converted into many things: PDF, HTML, plain text… I want to have abstract concepts for what it represents, regardless of the specific format used. In other words it depends only on the LaTeX source, so it exists even if the source is never converted to anything else.

How, here’s something else: There is no strict binding to syntax or semantics!!! For example, what is the syntax of a PNG image? It’s a binary format, right? But it can still be viewed using a hex editor, meaning that every file is also a binary file. Now, this thing suggests something to us: While there are “how represented” and “what means” semantics, there is absolutely no difference to the computer between one form of rendering (e.g. render SVG as text in Gedit using the Pango library via GtkSourceView) and another (e.g. render the SVG as a vector graphic using librsvg).

I need to read about MIME types, but look:

YAML is application/yaml
XML is application/xml
SVG is image/svg+xml
RDF is application/rdf+xml

MIME types assign a single type for each file, so they try somehow to encode the double-face of XML files, as being both text-editable and renderable e.g. to image or PDF. I want my model to reflect both in a uniform way.

So here is some theory: Each File is a BinaryBlock, and each BinaryBlock is a Block. Each BinaryBlock can be edited using a hex editor, which means all files that aren’t fake, e.g. directories (they have inode/directory MIME type), have this facet regardless of anything else they mean.

Now, the binary digits can be used to encode things easier to work with. For example, an XML file can encode anything: It’s a text based way to represent and encode information. However, it’s merely a subset of the wider “text” encoding, i.e. using binary digits to encode text characters using ASCII or UTF-8. Since an XML file is the structure, and it’s not strictly forever related to a specific encoding, these are specified separately.

Here’s some tree work:

Content
Block
BinaryBlock
Text
XML

But what about a text-only PDF? Is Text plain-text or anything textual including an image? Also, why can’t a binary sequence not be a Block, e.g. like Text doesn’t have to be a Block in general? IDEA: Every file can be opened with a text editor, even if it’s not meant to be text, and some portions may be readable. So the point here is to say “this file is safe for opening with a text editor”, and not “thing file contains only text, even it this text is rendered as a JPG image”. So Text will be the “plain text” meaning like the text/* group of MIME types.

Let’s try to rework the tree. First the binary issue: done moving this to diagram

+ Content
 + BinarySequence
  + BinaryBlock
 + Block
  @ BinaryBlock

So BinaryBlock has two superclasses, and a binary sequence is now a separate concept not tied to Blocks.

The text issue: done moving this to diagram

Content
Document
PDF
OpenDocument
M$ DOC
M$ DOCX
Abiword document
Text
XML + RDF + SVG (also an image)
YAML + JSON

[See repo JSON]

Clone

Branches

Tags

tasks-and-ideas.mdwn