Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>
Clone
HTTPS:
git clone https://vervis.peers.community/repos/yEzqv
SSH:
git clone USERNAME@vervis.peers.community:yEzqv
Branches
Tags
20.mdwn
[[!template id=ticket class=task assigned=fr33domlover done=yes]]
[[!meta title=“Display commit history for git repos”]]
Issue
The Git repo history log is probably one of the most useful UI elements in a repo’s web view. Add a page to the web app which displays a summary of commits, like a graphical git log
.
Plan
If each commit had a single parent, commit traversal would probably be simple. Find the ref for a given branch, say master
, and move from each commit to its parent recursively, getting the desired reverse-chronological order. However, the situation isn’t so simple. A commit may have more than one parent. I’m not a git expert, but I know at least one way for that to happen: git merge.
The git history is therefore not a list, but a directed acyclic graph (DAG). Here are some suggestions for traversing the DAG.
- Create a hashmap of refs. Go over the DAG in whatever way and push all the refs into the hashmap. Then read the commit data for each ref, sort the commits by reverse chronological order and traverse the list.
- Load the refs into a graph using some graph library. There are several. For example,
containers
hasData.Graph
and there’s alsodawg
and related packages. Since in my case the number of links is small, it may be more efficient to use an edge list and not an association matrix. Anyway, once the graph is built, run some sort of traversal loop on it. Regular DFS and BFS won’t work, but topological order should be fine. However, Git rev-list uses time based order by default, which isn’t the same as topological. Find out what I need.
Go to the Gogs code and see how it builds the commit history page. Result: It seems to be using regular git log. But it uses the command’s “skip” and “max count” parameters for pagination.
There seems to be no existing pure-Haskell solution for that, otherwise I could just use it. The gitlib
package’s hit
backend is incomplete, and commit traversal happens to be one of the things not implemented yet. Other (C based) backends do have it, but I’d like to try doing this in Haskell directly.
IDEA: Time based order may mix commits from different branches. Instead, when a commit has more than one parent, pick the first parent to check using the commit time. In other words, a specific topological sort.
Progress
Since I don’t remember exactly how topological sort works, I’m using Wikipedia. There are 2 algorithms there. I’ll start with the first one. Here’s an initial version without any particular unique order:
L <- Empty list that will contain the sorted elements
S <- Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
Let’s adapt the wording to use some git terms:
-- A list that will contain the sorted commits, latest commit first
L <- Empty list
-- Set of all commits with no parents left in the graph
S <- Set containing a single commit, the last one in the chosen git branch
while S is non-empty do
remove a commit n from S
add n to tail of L
for each commit m with an edge e from n to m, i.e. m is a parent of n, do
remove edge e from the graph
if m has no other incoming edges, i.e. no other children, then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
Here’s an example commit graph from man git-rev-list
:
.-A---M---N---O---P---Q
/ / / / / /
I B C D E Y
\ / / / / /
`-------------' X
I
is the initial commit. The A
-Q
line is the first parent of each merge.
The characteristics we want to achieve (goals):
- (G1) Given a sequence of
n >= 2
commitsC1 ... Cn
, whereC1
andCn
have at most 2 edges each and the rest have exactly 2, the sequence must appear as-is in the sorted order. For example, take=X-C1-C2-Y=
.X
andY
3 edges each, or more.C1-C2
is a pair of commits made to a side branch, and then merged intoY
. Then the sequenceC1-C2
should appear in the sorted order without anything in between. - (G2) When a branch splits into several lines of parallel development (for example the branch itself and several side branches), and the side lines are then merged into the main line at a single point, the order in which they should appear (each appears as-is due to the previous bullet point) is the reverse chronological order of the last commit of each line.
The parts of the algorithm we need to choose (variables):
- (V1) How does
S
work? Is it a set, a stack, a queue? When we remove a commit fromS
, how do we choose which commit to pick? - (V2) When we iterate over the parents of a commit, in which order do we do so? Does it matter?
Let’s try to satisfy G2. When need to pick a commit from S
, suppose we always take the latest commit, according to the commit’s datetime field. If we do that, we’ll at least start later lines before ealier ones. And if we assume we already satisfied G1, it’s enough. But this simple solution works only under a certain condition. It will fail if we have the following case.
(master) ?--?--?--P1--C
\ /
(fix-bug) ?--P2
Suppose P1
was commited earlier than P2
chronologically. If we have both P1
and P2
in S
, then using the suggested solution, we will pick P2
first which is correct. If we have only P2
in S
while P1
is still waiting to be visited, same result. If we have neither in S
yet, fine too. But what if we have P1
in S
while P2
hasn’t reached S
yet? That would ruin the solution. Is this condition possible?
Let’s follow the algorithm to find out. At some point, we take C
out of S
. Then we visit P1
and P2
in some order. Since C
is the only child of both, both of them will reach S
. Therefore we are safe. But what if one of them has another child:
(master) ?--?--?--P1--C
\ /
(fix-bug) ?--P2
\
(add-feature) D
This will be a problem only if both C
and D
are ancestors of our branch’s last commit. But I have an idea: Since using the time of the last commit of a line is just a heuristic, what if instead of using the time, we use the order of the parents? Let’s try that for simplicity. Use the reverse order, i.e. first the merged branches, and then, last, the master branch.
Hmmm that last simplification doesn’t seem to help. New example:
(master) ?--?--?--P1--C---A
\ / /
(fix-bug) ?--P2--B
In this example, iterating by reverse parent order seems to solve the problem. I can try to find an example where it doesn’t, but since I don’t (yet) exactly understand git commit graphs, I prefer to decide for now that we use the commit time. The problem arises if we pick C
out of S
before we pick B
. Let’s assign times to arrange for that. The changes will be:
P1
becomesP
andP2
becomesQ
X(n)
means a commit labeledX
with timen
, where biggern
means later time (supposen
means days since the project started)(master) ?–?–?—P(1)—C(4)—A(5) / / (fix-bug) ?——Q(2)—B(3)
Valid results:
- A, C, B, Q, P
The algorithm will do the following:
1. L <- [] -- L = [] ; S = ?
2. S <- {A} -- L = [] ; S = {A}
3. Remove A from S to L -- L = [A] ; S = {}
4. Insert B and C to S -- L = [A] ; S = {B, C}
5. Remove C from S to L -- L = [A, C] ; S = {B}
6. Insert P to S -- L = [A, C] ; S = {B, P}
7. Remove B from S to L -- L = [A, C, B] ; S = {P}
8. Insert Q into S -- L = [A, C, B] ; S = {P, Q}
9. Remove ...
This is going to be fine.
Let’s go back to the parent order.
B
should come beforeC
because it’s the merged parentQ
should come beforeP
similarly
Maybe G2
isn’t even needed. Maybe it’s a too tight condition. Let’s try G1
and see what happens.
(master) A--B--C--D--E--F--G--H
\ /
(fix-bug) W--X---Y---Z
The order I want is: H, Z, Y, X, W, G, F, E, D, C, B, A
Let’s run the algorithm.
1. L <- [] -- L = [] S = ?
2. S <- {H} -- L = [] S = {H}
3. Move H from S to L -- L = [H] S = {}
4. Insert G and Z to S -- L = [H] S = {G, Z}
5. Hmmmm
How do we choose between G and Z now? The order should be: Pick Z first and continue picking its parent until W. Idea: Use a stack for that. Rules:
- When going over parents, go in parent order (i.e. merged last)
S
is a stack, i.e. last in first out
Trying again.
1. L <- [ -- L = [ S = ?
2. S <- H] -- L = [ S = H]
3. Pop H from S to L -- L = [H S = ]
4. Push G then Z to S -- L = [H S = ZG]
5. Pop Z from S to L -- L = [HZ S = G]
6. Push Y to S -- L = [HZ S = YG]
7. Pop Y from S to L -- L = [HZY S = G]
8. Push X to S -- L = [HZY S = XG]
9. Pop X from S to L -- L = [HZYX S = G]
10. Push W to S -- L = [HZYX S = WG]
11. Pop W from S to L -- L = [HZYXW S = G]
12. Pop G from S to L -- L = [HZYXWG S = ]
13. ...
Good, it’s going to work. Since it fully decides how to pick from S and how to traverse parents, maybe it’s enough. Now I have 2 options:
- Implement the algorithm by myself
- Use a graph library that can do this kind of topsort
Before I choose some existing library, I’d like to try naively building a solution myself. For simplicity, assume no pagination. Just make a single list. Ideas:
L
can be a list to which we prepend, and reverse when done, or it can be a different sequence type with efficient append, such asDList
orSeq
orvector
S
is a stack, can be easily implemented using a list- Build a HashMap in which they key is
Ref
and the value is a pair ofCommit
andInt
, the latter being the number of the commit’s children
How to build the hashmap:
- Make a singleton hashmap containing the last branch commit with 0 parents
- Read each parent of that commit, insert into hashmap with 1 children
- For each such parent, iterate over its children. For each child, check if it’s already in the hashmap. If yes, increase its child count. If not, insert with count 1 and recursively do steps 2 and 3 on it.
Hmmm I seriously doubt it will be faster than using a graph library. But there is one thing I could optimize here: There isn’t really need to remove edges from the graph, especially if checking for existence of incoming edges is a linear-time operation. Anyway, let’s just try the naive approach.
There are 3 graph libraries which seem reasonable:
- [[!hackage containers]]
- [[!hackage fgl]]
- [[!hackage graph-core]]
I’m not sure which one is the best in my case. I’d like to start by trying FGL.
Result
Done for now. Works for a single branch, haven’t tested yet on repos which have multiple branches and merges between them.