Mirror of the Rel4tion website/wiki source, view at <http://rel4tion.org>

[[ 🗃 ^yEzqv rel4tion-wiki ]] :: [📥 Inbox] [📤 Outbox] [🐤 Followers] [🤝 Collaborators] [🛠 Commits]

Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

Tags

master :: projects / vervis / tickets /

20.mdwn

[[!template id=ticket class=task assigned=fr33domlover done=yes]]

[[!meta title=“Display commit history for git repos”]]

Issue

The Git repo history log is probably one of the most useful UI elements in a repo’s web view. Add a page to the web app which displays a summary of commits, like a graphical git log.

Plan

If each commit had a single parent, commit traversal would probably be simple. Find the ref for a given branch, say master, and move from each commit to its parent recursively, getting the desired reverse-chronological order. However, the situation isn’t so simple. A commit may have more than one parent. I’m not a git expert, but I know at least one way for that to happen: git merge.

The git history is therefore not a list, but a directed acyclic graph (DAG). Here are some suggestions for traversing the DAG.

Go to the Gogs code and see how it builds the commit history page. Result: It seems to be using regular git log. But it uses the command’s “skip” and “max count” parameters for pagination.

There seems to be no existing pure-Haskell solution for that, otherwise I could just use it. The gitlib package’s hit backend is incomplete, and commit traversal happens to be one of the things not implemented yet. Other (C based) backends do have it, but I’d like to try doing this in Haskell directly.

IDEA: Time based order may mix commits from different branches. Instead, when a commit has more than one parent, pick the first parent to check using the commit time. In other words, a specific topological sort.

Progress

Since I don’t remember exactly how topological sort works, I’m using Wikipedia. There are 2 algorithms there. I’ll start with the first one. Here’s an initial version without any particular unique order:

L <- Empty list that will contain the sorted elements
S <- Set of all nodes with no incoming edges
while S is non-empty do
  remove a node n from S
  add n to tail of L
  for each node m with an edge e from n to m do
    remove edge e from the graph
    if m has no other incoming edges then
      insert m into S
if graph has edges then
  return error (graph has at least one cycle)
else
  return L (a topologically sorted order)

Let’s adapt the wording to use some git terms:

-- A list that will contain the sorted commits, latest commit first
L <- Empty list
-- Set of all commits with no parents left in the graph
S <- Set containing a single commit, the last one in the chosen git branch
while S is non-empty do
  remove a commit n from S
  add n to tail of L
  for each commit m with an edge e from n to m, i.e. m is a parent of n, do
    remove edge e from the graph
    if m has no other incoming edges, i.e. no other children, then
      insert m into S
if graph has edges then
  return error (graph has at least one cycle)
else
  return L (a topologically sorted order)

Here’s an example commit graph from man git-rev-list:

  .-A---M---N---O---P---Q
 /     /   /   /   /   /
I     B   C   D   E   Y
 \   /   /   /   /   /
  `-------------'   X

I is the initial commit. The A-Q line is the first parent of each merge.

The characteristics we want to achieve (goals):

The parts of the algorithm we need to choose (variables):

Let’s try to satisfy G2. When need to pick a commit from S, suppose we always take the latest commit, according to the commit’s datetime field. If we do that, we’ll at least start later lines before ealier ones. And if we assume we already satisfied G1, it’s enough. But this simple solution works only under a certain condition. It will fail if we have the following case.

(master) ?--?--?--P1--C
                \    /
(fix-bug)        ?--P2

Suppose P1 was commited earlier than P2 chronologically. If we have both P1 and P2 in S, then using the suggested solution, we will pick P2 first which is correct. If we have only P2 in S while P1 is still waiting to be visited, same result. If we have neither in S yet, fine too. But what if we have P1 in S while P2 hasn’t reached S yet? That would ruin the solution. Is this condition possible?

Let’s follow the algorithm to find out. At some point, we take C out of S. Then we visit P1 and P2 in some order. Since C is the only child of both, both of them will reach S. Therefore we are safe. But what if one of them has another child:

(master) ?--?--?--P1--C
                \    /
(fix-bug)        ?--P2
                     \
(add-feature)         D

This will be a problem only if both C and D are ancestors of our branch’s last commit. But I have an idea: Since using the time of the last commit of a line is just a heuristic, what if instead of using the time, we use the order of the parents? Let’s try that for simplicity. Use the reverse order, i.e. first the merged branches, and then, last, the master branch.

Hmmm that last simplification doesn’t seem to help. New example:

(master) ?--?--?--P1--C---A
                \    /   /
(fix-bug)        ?--P2--B

In this example, iterating by reverse parent order seems to solve the problem. I can try to find an example where it doesn’t, but since I don’t (yet) exactly understand git commit graphs, I prefer to decide for now that we use the commit time. The problem arises if we pick C out of S before we pick B. Let’s assign times to arrange for that. The changes will be:

  1. P1 becomes P and P2 becomes Q
  2. X(n) means a commit labeled X with time n, where bigger n means later time (suppose n means days since the project started)

    (master) ?–?–?—P(1)—C(4)—A(5)   / / (fix-bug) ?——Q(2)—B(3)

Valid results:

The algorithm will do the following:

1. L <- []                -- L = []        ; S = ?
2. S <- {A}               -- L = []        ; S = {A}
3. Remove A from S to L   -- L = [A]       ; S = {}
4. Insert B and C to S    -- L = [A]       ; S = {B, C}
5. Remove C from S to L   -- L = [A, C]    ; S = {B}
6. Insert P to S          -- L = [A, C]    ; S = {B, P}
7. Remove B from S to L   -- L = [A, C, B] ; S = {P}
8. Insert Q into S        -- L = [A, C, B] ; S = {P, Q}
9. Remove ...

This is going to be fine.

Let’s go back to the parent order.

Maybe G2 isn’t even needed. Maybe it’s a too tight condition. Let’s try G1 and see what happens.

(master) A--B--C--D--E--F--G--H
                \            /
(fix-bug)        W--X---Y---Z

The order I want is: H, Z, Y, X, W, G, F, E, D, C, B, A

Let’s run the algorithm.

1. L <- []                -- L = []          S = ?
2. S <- {H}               -- L = []          S = {H}
3. Move H from S to L     -- L = [H]         S = {}
4. Insert G and Z to S    -- L = [H]         S = {G, Z}
5. Hmmmm

How do we choose between G and Z now? The order should be: Pick Z first and continue picking its parent until W. Idea: Use a stack for that. Rules:

  1. When going over parents, go in parent order (i.e. merged last)
  2. S is a stack, i.e. last in first out

Trying again.

1.  L <- [                 -- L = [          S =   ?
2.  S <- H]                -- L = [          S =  H]
3.  Pop H from S to L      -- L = [H         S =   ]
4.  Push G then Z to S     -- L = [H         S = ZG]
5.  Pop Z from S to L      -- L = [HZ        S =  G]
6.  Push Y to S            -- L = [HZ        S = YG]
7.  Pop Y from S to L      -- L = [HZY       S =  G]
8.  Push X to S            -- L = [HZY       S = XG]
9.  Pop X from S to L      -- L = [HZYX      S =  G]
10. Push W to S            -- L = [HZYX      S = WG]
11. Pop W from S to L      -- L = [HZYXW     S =  G]
12. Pop G from S to L      -- L = [HZYXWG    S =   ]
13. ...

Good, it’s going to work. Since it fully decides how to pick from S and how to traverse parents, maybe it’s enough. Now I have 2 options:

  1. Implement the algorithm by myself
  2. Use a graph library that can do this kind of topsort

Before I choose some existing library, I’d like to try naively building a solution myself. For simplicity, assume no pagination. Just make a single list. Ideas:

How to build the hashmap:

  1. Make a singleton hashmap containing the last branch commit with 0 parents
  2. Read each parent of that commit, insert into hashmap with 1 children
  3. For each such parent, iterate over its children. For each child, check if it’s already in the hashmap. If yes, increase its child count. If not, insert with count 1 and recursively do steps 2 and 3 on it.

Hmmm I seriously doubt it will be faster than using a graph library. But there is one thing I could optimize here: There isn’t really need to remove edges from the graph, especially if checking for existence of incoming edges is a linear-time operation. Anyway, let’s just try the naive approach.

There are 3 graph libraries which seem reasonable:

I’m not sure which one is the best in my case. I’d like to start by trying FGL.

Result

Done for now. Works for a single branch, haven’t tested yet on repos which have multiple branches and merges between them.

[See repo JSON]