Home → Repo ^yEzqv → Branch master Files → projects → razom-basic-store → dilosi → diary.mdwn

Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

diary.mdwn

I want to make it build as soon as possible, even without changing the main() function. The main obstacle will be SGP: I need to give it the whole Skeleton too so that it can build, and then make install it. Only then Saugus can use it as a build dependency (I hope the include-only-sgp.hpp thing won’t cause problems with build-only dependencies, because it will also cause any non-template headers to be included as well, and their symbols may end up missing the match when ld runs. We’ll see. Either I will relax this for code meant to be used header-only (like SGP), I’ll use two separate top-level headers - one for templates, one for the rest. Then sgp.hpp can include both for completeness.

Even if some classes have dummy implementation of methods, it’s fine - just make Automake take all the sources and build and install. And make distcheck work, e.g. by writing the NEWS files with some dummy maybe? Read about how to update the version in configure.ac, before or after release…

Also see how Redland implements the Factory thing for choosing a database backend in realtime by name. It’s kind of a plugin system essentially with discovery, I will need to have one too.

I can’t find any information about using C++ templates with Autotools. Since my templates use the .cpp extension, and not .tpp or anything like that, I need to make sure SGP’s template .cpp files get installed together with the headers, as if they were .hpp files.

Solution: With libtool the installed headers are specified separately. But it’s a good idea anyway to start using .tpp instead of .cpp for template implementations, because .cpp in /usr/include is confusing.

I found several pages in the autotools manuals which help:

Making Saugus autotools setup compile all the currently written sources…

Implement all functions, even with dummy code
Make the SAUGUS_HPP_INSIDE_ and SAUGUS_COMPILATION macros work
Build and install SGP, since it’s a build dependency

If a library exports a class with a member function template, how does the implementation get installed? The whole .cpp file? It will cause functions to be defined twice, and either it’s an error or the .cpp will get override the compiled functions and it will be as if the whole .cpp was a template. SOLUTION: Either implement these function templates in the header, or add a .tpp file (in addition to .hpp and .cpp) which will implement them.

What about a program? It’s less reusable to not take care of this in a program, because if someone decides to reuse the code in a library, she’ll need to manually move those definitions from the .cpp, either to .hpp or to new file .tpp. SOLUTION: Start using the same approach for programs too. There are two options:

Implement member function templates in the .hpp
Create a .tpp file for them

DECISION: Let’s keep the definition separate from the declaration. The .hpp file should be short and clean, and contain just signatures and documentation. I’m thus taking option 2 - use a .tpp file.

Another issue: The .tpp files include the .hpp files, which causes unnecessary circular inclusion. The header guards solve this, but .tpp files should never be included directly! Here are 2 approaches:

Keep the #include because that’s how cpp files work in general
Remove it, and instead test for the hpp’s header guard and #error if needed

SOLUTION: Let’s see if the build succeeds as is, and then decide. I can try to measure time and see if the second approach performs better. It may be significant for large projects in which the build process takes a while.

All functions are now implemented, although some have no-op dummy code.

Working on SAUGUS_HPP_INSIDE_ and SAUGUS_COMPILATION… done.

Now let’s go fix SGP… it’s a library so there are more issues.

Basic skeleton, borrowed from Saugus - done
Library specific config - done
Header inclusions, etc. - done
hpp/cpp/tpp arrangement - done

The autotools work now, but there are things to fix before SGP builds. There are also non-critical things I want to fix. Here’s a list of things to take care of:

Fix all the includes (replace <Sgp/X.hpp> with "X.hpp") done
Make sure header guards have consistent names (underscore prepended/appended) done
Make sure all files have a copyleft notice done
Make sure all files have (for now blank) doxygen comments done
Make sure all files have only-main-header-can-be-included-directly guards done

There will probably be C++ syntax errors and missing #includes, so I’ll have to deal with them too. But these two will probably be the last problems, after which SGP will build.

Then go back to Saugus:

Make a final version of the Graph interface, decide how StatementPattern works and whether I need it, etc. Look at Soprano and Redland for ideas.
Implement all the classes I need for Graph and Repository, e.g. Query and QueryResult and Table
Implement all the rest of the missing/dummy functions, including full Graph implementation but not the InMemoryRepository::do_query yet
Remember the idea that query() should run a given function on each match, rather than just putting results in a table? It’s discussed above. Make Graph support this for its find_* methods and make Repository use such a function instead of the QueryResult-returning one. Take the time and think about it well - good design now will result with good code later
Consider splitting GraphBase from Graph
Think about higher level datastore class, e.g. one in which add() takes a triple and creates a statement identifier by itself by generating a uuid
Plan the resolver algorithm here, format nicely somehow (I wrote suggestions in my personal ideas page)
Implement InMemoryRepository::do_query
Implement functions for modifying the graph, either by looking at how SPARQL does it or just using some workaround just to be able to stress-test the query function
Start adding query features…

SGP builds and I installed it on my system. Most of it is headers and there are no unit tests, which means most of the code is likely to fail… actually, since users must include all the headers, any use will probably result with a compilation error. It won’t even build. I’ll work on this later, and either fix the problems or remove some currently-unused headers from the build. Later I will document all the classes and make SGP fully functional.

Let’s start with the first task in the list above: Graph’s interface.

Graph Interface

Do I need StatementPattern? How does it work?

Clearly any predicate can be implemented using the Match functor. The StatementPattern doesn’t add any new functionality - it’s just for convenience. Since I didn’t use the API yet, I can’t say how useful it is. How common it would be to use it in place of the Match functor. But in any case, I can still regret decisions and remove it, because the API is not stable yet. This is my chance to try things. The only way to know something isn’t needed, is to try using it and see how it feels.

The idea behind StatementPattern is to have a separate pattern for each statement component. Most of the time the first component - the statement identifier - is not used, because it doesn’t mean anything, ever. It may be a good idea to have a version which takes just 3 pattern parameters, implictly setting the identifier to "*“, i.e. ”accept any value".

I can have two layers here:

A template which has 4 template parameters
A class with 4 virtual functions people can override

Let’s call the first one StatementPatternTemplate. And let’s start writing and see how it goes…

Good. Now, what about the virtual functions? Assume we have a class with one virtual function for each match. It means each pattern requires its own whole new class! Doesn’t make sense. It makes sense to have each component as an object which can be switched polymorphically. For example, each can be a std::function and you could use functions to construct these std::functions.

Let’s try some syntax examples.

Pattern ("*", "*", "*", "*")

This should match any statement. But wait, it means the matcher undertands a speficic matching function - bad. Instead how about this:

Pattern (Regex ("\*"), Regex ("\*"), Regex ("\*"), Regex ("\*"))

Hmmm… it’s not better enough compared to the template. I can just keep the template for now, and write these Regex things for convenience. For example, a match functor which takes a regex. Does one exist already?

This invites a bigger question: Which filters do I want to have? Regex is not the only one possible. Things like number/string comparison are useful too. But let’s look at Soprano and Redland and get ideas.

Okay, got ideas. Nothing new there, actually. Redland doesn’t seem to offer matching directly (could be implemented easily though, by iterating over the set of statements), and Soprano seems to do matching using partial statements, i.e. ones with some components possibly being blank.

Idea: Let’s not worry about specific filters, they can always be added later. What I really need is the interface.

Conclusion: All I really need is to pass functors to the StatementPatternTemplate and I’m done.

DECISION: Since I don’t need StatementPattern anymore, I can remove it and give this name to the Pattern.

Wait a second… the functions which match Graph statements by Match and by Pattern look the same! Actually, if I give StatementPattern an operator () member, it can be passed as a Match functor!

DECISION: I’m adding operator() and removing those functions from Graph.

Before I proceed to the next task, I’d like to consider something I saw in Redland and in Soprano: Iterating over the statements using an iterator. I do already have this, but only for the whole graph - iterating on just a filtered subset requires using one of the find_statements() functions, which create a whole new graph. I’d like to add a filter iterator.

If I had direct begin and end functions, I could just add something like begin_filtered or iterate_statements. But since I use an IterationInterface, I need the filter iterator to work with it too. Idea: Just use the IterationInterface template, but with a different iterator type - one which stores the filter function and applies it when iterating. I believe Boost has such an iterator, in which case there’s no reason to reinvent and write my own class for it. Let’s see…

My IterationInterface takes a container and a functor which gets iterators from it. In order to use any iterator wrappers, I need to write one new Functor class which instead of calling the container’s begin and end, uses a given wrapper it takes as a parameter. Then all I need is to get that wrapper. Let’s see how Boost.Iterator works, I don’t remember…

Okay. It’s not simple: Boost isn’t written for use with the autotools. But there is a set of macros in the Autoconf Archive, so I’ll be fine. I can also use regular AC_ checks to test existence of a known Boost header. On my Debian 7, This is where the iterator headers come from:

$ dpkg --search /usr/include/boost/iterator/filter_iterator.hpp
libboost1.49-dev: /usr/include/boost/iterator/filter_iterator.hpp

I will try to use the AX_ macro for boost base and append the CPP flags to the variable used by automake. This is done for simplicity. Later if it becomes a problem, I can switch to just checking for that specific header, allowing to compile on a system without a full Boost installation.

Adding Boost support to Saugus autotools files… done.

What’s next:

Implement in Saugus an iterator functor which can work with the filter iterator
In Saugus, use it in Graph with IterationInterface and filter_iterator. Also, IteratorWrapper is probably required, otherwise the filter_iterator will end up being random-access etc. which is bad, because Graph promises only a forward iterator. Use a ready ForwardIteratorWrapper if I made one.

Done.

Before I close the first task, one last look at the Graph header to make sure I don’t have anything else to change… done.

Helper Classes

The next task is “Implement all the classes I need for Graph and Repository, e.g. Query and QueryResult and Table”. Let’s start.

Wait, first something else - make SGP pass a compilation test.

It passed, but there’s a problem: When the headers are listed in Makefile.am without nobase, the header under src/detail is installed in the same folder as the other headers, not in a detail subfolder. When using nobase it does, but all the headers are installed in an src subdir. I don’t want that. I need to find a way to solve the problem… I hope a secondary recursive makefile won’t be necessary. We’ll see… worst case, I could move the header out of detail as a workaround.

Also, I can see what e.g. GTK does.

Read about nobase again in the Automake manual.

Problem solved. Now, the helper classes.

NEXT: Graph.tpp, finish implementing the Graph class. QueryResults is Table with column names, maybe more things later (since a query could also be true/false)

NEXT: Proceed with Table, Query implementation and all the other incomplete classes Repository and Graph need. Then write InMemoryRepository based on that, it will need Query and QueryResult to have stable interfaces. Then, go to that list of tasks I wrote somewhere above, and proceed to the next task (or first read the current one to make sure I finished it).

QueryResult

When you send a query, you know the order of the columns. So when you get a result table, you can basically use numeric indices to find values in rows. However, why does the order matter? I want an interface which does not rely on the order, and even allow Naya result tables to be orderless, relying on names to get the values.

Where is the right place to implement this? Maybe just use a quick workaround for now. When you define a query, at the moment the parameter names are simply strings inside the statement pattern. And right now, their order, although shouldn’t be significant, is determined by appearance in the statement pattern. So all that is really needed is a mapping from strings to numbers. Maybe QueryResult can supply a Table and in addition such a mapping. It may be ugly but it will work for now.

I’ll know more when I actually use the interface. And I can use Redland and Soprano for ideas. Redland doesn’t use an iterator, and Soprano has a Java-like “heavy” iterator which probably contains the mapping from name to index. I prefer light iterators, which feel like the STL ones. I suppose I can start with a simple Redland-like interface and then have something like Soprano’s (but in my style) as a wrapper.

No, wait a second…

It hurts my eyes to work with an ugly workaround. Let’s think about it again. Remember that the data structure used for iteration has to be universal, and ALL BACKENDS ARE GOING TO USE IT. It means it has to be efficient easy to use for both sides: For backends to fill, and for users to iterate.

Let’s try some examples then. After getting the query result, what you normally want is to take some row and get values from it based on names. For example, assume your query looks like this:

($x u $y v)

Where u and v are entities. Then you’re going to have a two-column table. It doesn’t really matter whether you use name strings directly. Since in the near future the result pattern is going to be directly specified, I can use some kind of “Column” class. It’s better than name strings because:

You can change the names easily in one place
It relies on static objects of constant size, not dynamic arbitrary length strings

Assume I have something like this:

Column x, y;

Now I can get result contents conveniently, e.g. like this:

QueryResult qr = /* execute query */;
for (const Row& row : qr)
{
	std::cout << row[x] << ", " << row[y] << '\n';
}

The problem is the Row structure.

It’s too simple to know about Columns, unless they’re convertible to numbers
It may be inefficient to have dynamic allocation for every single row

I have an idea. Although Redland is written in C and probably doesn’t mind to optimize more than I do, let’s see what it does. And Soprano too.

(after a while…)

Redland’s code wasn’t easy for me to navigate. But it looks like everything is just arrays of rows or statements, and these are each dynamically allocated. For some reason Redland uses reference counting, even for small things which aren’t that likely to be used many times. Anyway, let’s see if Soprano has something similar…

(after a while…)

Soprano doesn’t seem to have its own in-memory backend. It has backends just for Sesame, Redland and Virtuoso.

DECISION: Making Table a single huge array/vector could be fast, but it makes iteration classes complicated and isn’t worth the effort unless it’s actually too slow. Right now it isn’t, now I’m staying with the row vector.

Now, what about the Columns? Obviously, variable names are a source code thing and a Query object can’t set them at runtime. In order to use x and y as above, I need to pass them to the Query. Somewhat like when creating a Gtk::TreeModel. Hmmm… this suggests that the user should define new symbols for each individual query. Not the most convenient approach.

Before I think further, here’s another idea: After you get your query results, you are able to determine the order of the columns. So it is now possible to ask the query result object for the index of a given column, e.g. by name. In gtkmm, you define a new class for each tree model, so the symbolic index is a public data member of an object. If the numeric index is retrieved in real time from the Saugus::QueryResult, access may look like this:

QueryResult qr = /* execute query */;
for (const Row& row : qr)
{
	std::cout << row[qr.get_column_index ("x")]
	          << ", "
	          <<\ row[qr.get_column_index ("y")] << '\n';
}

This is ugly, of course. But what if qr used operator [] for columns? It’s not intuitive, but it would look like this:

QueryResult qr = /* execute query */;
for (const Row& row : qr)
{
	std::cout << row[qr["x"]]
	          << ", "
	          <<\ row[qr["y"]] << '\n';
}

Hmmm… could be better. Even if we got some “mapping” object from qr and used it instead of qr itself, which would be more intuitive, we’d still have the double operator [], which is a bit ugly.

I want to try something new. What if instead of strings, variables were numbers? For example, assume you take a query string and parse it. Instead of names, the resulting statement patterns contain numbers. It’s possible to keep a mapping between numbers and strings, so you can use then with the query results if you wish.

The problem with numbers is that it makes statement patterns less reusable: Names are much easier to manage than numbers, e.g. if you want to use the same pattern in several queries. However, it’s possible to manage numbers with variables and e.g. have a factory which generates fresh new numbers. With 4 or 8 bit numbers, it would take a LOT of time to consume them all.

Assume you’re building your query by hand in the code. Even… yes! Even then, each parameter gets a name you assign manually, in-place in the source code or in a data file defining the query! So you can just use the same names with the results!

Even if your query is read from a file, it probably has a specific meaning and can define the Column objects in the code.

DECISION: Name strings are short anyway. Let’s just use them for now.

Now, what about row element access?

member function by index
query result maps names to indices

Since the number of variables will be usually small, I’m using a simple array or similar for the mapping, not heavy things like std::map or std::unordered_map.

UPDATE: What I’m developing here is actually the API, not the datastore, so I’m changing this project’s name to Dilosi.