Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

3.mdwn

[[!template id=ticket class=task done=yes]] [[!tag /projects/language-kort/decisions]]

[[!meta title=“Model and framework Uid access in code”]]

Issue

There are two ways to represent a typed value in the Haskell code: Either with the type specified statically, known to the compiler, or with the type specified as a Uid. In the second case, the value itself must have a generic representation, e.g. a String, and type specific operations (such as number arithmetics) are impossible.

[[/projects/smaoin-hs]] supports statically typed values in the data model. [[/projects/language-kort]]’s parse tree doesn’t. When converting between the model and the parse tree, it’s necessary to be able to:

Determine the Uid of a given primitive Smaoin type
Determine the data constructor and conversion required for a given value string with a type Uid

These conversions should probably be in [[/projects/razom-text-util]].

For these conversions to be possible, the Haskell code must have access to the Uids of the Smaoin primitive types, and be able to match them with static recognition of value types. How should think work exactly?

Process

The plain naive solution is to write the Uids in a list, directly in Haskell code. But this solution raises questions:

Should the list be auto-updated when needed, and how?
Why write in Haskell, instead of querying a parsed data document written in source form (i.e. Idan)?
Should Uids be provided for all namespaces, or just Kadma, or just smaoin, or just Smaoin primitive types?

Similar cases to examine are:

The Unicode character database (UCD) based sheets provided as Haskell functions in several packages (some use C functions, but then the sheets are provided with the C libraries used)
RDF resource URIs provided as variables in Java RDF frameworks
GUI widget identifier (and other resources, like strings) variables auto-generated for andr0id Java app projects

Now some answers and thoughts.

List Updates

The list must either be the source document itself, or some other representation like Haskell code that is auto-generated. It’s possible to update the sheets when the source changes, and make a new release of the sheet package. For example, each [[/projects/Kadma]] release could have a matching release of a Haskell package containing the Uid sheets.

Conclusion: Everything except for the Idan source must be auto-generated.

Source Querying

Source querying can generally work. Actually, that’s what you’re supposed to do most of the time, right below the triplestore layer. Or you just query a triplestore. But there are two reasons not to do so in certain cases:

If you need a specific resource, you can avoid going through complicated queries and parsers and B+ trees and lots of dependency packages and their recursive dependencies… by “caching” the Uid you need in Haskell code. Queries are generally meant for capturing entities for which some condition holds, while what we want here is to capture a specific entity.
You may be writing the framework code itself. Source querying means that the whole multi-layer framework must not need any specific Uids, because that would create a cyclic dependency. The framework code needs a Uid, and getting the Uid requires using the tools supplied by the framework itself. For example, querying the Idan source for Smaoin primitive types in real time for Kort model conversions (which includes parsing) requires that Idan’s implementation doesn’t do that. It also creates an ugly wrong dependency.

Of course we can write the source in Kort, but then the Kort parser has a problem. And if we use some other format, like CSV, then we can as well use Haskell itself!

What about all the other parts? For example, a triplestore implementation. If it needs a specific Uid, can it read it from Idan source? Yes, it can, but then we have the first issue mentioned above, and the dependency on the Idan parser just for this simple little thing.

Conclusion: For all the framework parser code, we must use auto-generated Uids and they should be generated as Haskell source. For framework code that isn’t parsers, we can technically use source file querying, but there are issues of depedencies and overhead of ugly long data pipelines. It seems a good idea to prepare Haskell sheets for use by at least the framework code.

Scope

Which Uids should be generated in Haskell sheets, and where should they be placed in the module hierarchy?

There can be one one top-level module for the Uids, and under it submodules based on the namespace names. For example, the Uid of myns:Thing would be under Data.Smaoin.Uid.Myns.

Ideas for top-level module name:

Data.Smaoin.Uid
Data.Uid
Data.Resource
Data.Smaoin.U
Data.Smaoin.R
Data.Kadma

The problem is the name. Identifiers starting with an uppercase letter are used in Haskell for module names, type names and data constructors. Here are ideas:

Use lowercase names. For example, Myns.thing. This can become a problem if there are identical names modulo letter case, for example there could be a class Country and a property country :: Person -> Country. It’s also a bit confusing to use lowercase for all labels in the Haskell code, because in Smaoin the case is used, at least in English, to distingish easily between Classe and Objects.
Use a prefix. Since in the English localization uppercase-starting names are used for classes, perhaps a ‘c’ prefix would make sense. For example, Myns.cThing. But this has issues too. It’s ugly and less intuitive with this weird ‘c’ there, and could cause issues with name similarity. For example - yes, I know it’s an unusual one - there could be a label cLanguage whose resource refers to the C programming language. Then the identifier of it is identical to cLanguage, the Smaoin class Language. Can get very cConfusing.
Use a leading underscore. For example, Myns._Thing. Yes, it works.

What about packages? One package per namespace or one huge package?

First, since namespaces and ontologies can be created independently, it must be okay to bring sheets from any package. The real question here is whether the definitions of Kadma should be:

in one separate package?
or in several per-namespace packages?
or inside the smaoin package?

I think not mixing is a good idea. Release versions and so on. Keep the auto generated part separate. I could add sheet related utilities in the future, like the Unicode packages provides utility functions and not just plain lists or tuples, but those may be okay to add to the sheet package. For now, it’s just the sheets.

The name must reflect that:

It’s a package of resources, not functionality
These are Smaoin resource Uids, not images or audio etc.
These are resources from Kadma, the Smaoin framework: just them, and all of them

Name suggestions:

smaoin-resources
kadma-resources
kadma
ontology-smaoin
vocalbulary-smaoin
smaoin-vocabulary
vocabulary-kadma

What about the type of the identifiers? String, Text, ByteString, Resource?

Since the canonical form is that of Data.Smaoin, it seems to be the best is to use it as-is, i.e. through Resources.

Conversions

Some things are similar between Kort and Idan, or close to similar. So I put them in razom-text-util. But now I’m writing the conversions, and it doesn’t feel right. Why would the Kort-specific value regexes be useful for anything except Kort itself? Idan has differences, so it will need its own regexes anyway.

There may be some accepted/standard/universal/de-facto regexes for Smaoin values, but I don’t see a reason to put Kort regexes in a utility package while Idan maintains its own regexes. Let’s move them to language-kort.

Decisions

Package to provide the conversions: The general parts in razom-text-util, the rest in language packages.

Scope of Uids provided: Whatever the framework needs, starting with the specific ones required for Kort’s implementation.

Uids provided as: Haskell packages, auto-generated from their Idan sources. Kadma will have one such packages, and ontologies etc. can be provided in separate packages or inside related software packages. Start with hand written definitions, and implement auto-generation later. It needs an Idan parser anyway.

Package name for Kadma Uids: vocabulary-kadma.

Content of the sheets: Any entities are allowed, not just Uids. In other words, values can be used too. For example, numeric constants. Also, the resources specified can be “data objects”, not just ontology concepts.

Top level module for Uids: Data.Smaoin.Vocabulary. The last part could become Vocab but people will probably import is qualified as V anyway.

Data type for Uids: smaoin’s Resource.

Submodule naming: For now, take the namespace prefix and make the first letter uppercase. Maybe when I see some names I’ll prefer to use some kind of CamelCase style.

Identifier naming: For labels starting with a lowercase letter, use the label as an identifer. For labels starting with an uppercase letter, i.e. Smaoin classes, prefix the label with an underscore (_) to get the Haskell identifier.

Plan

Create project vocabulary-kadma and put the definitions needed for Kort there, hand-written, copied from the Kadma Idan sources. Make release 0.1.0.0 of it. For now releases will have the regular 4 digit scheme. Maybe later they will just follow the release numbers of Kadma itself.
Implement in razom-text-util and language-kort the conversions needed for Kort, and make it build with smaoin 0.2.0.0
Finish implementing language-kort, write tests, make a release

Results

Done.

[See repo JSON]