Clone

HTTPS: git clone https://vervis.peers.community/repos/yEzqv

SSH: git clone USERNAME@vervis.peers.community:yEzqv

Branches

master

23.mdwn

[[!template id=ticket class=task assigned=fr33domlover]]

[[!meta title=“HTTP smart mode”]]

Issue

Git supports push and pull over HTTP. There is an old dumb mode which simply GETs the git repo’s file, and a smart mode that works similarly to how git works over SSH.

Git includes a CGI program git http-backend which implements both modes. But it seems to be easy enough to implement by myself. I looked at Gogs code and it seems to do that indeed. However, it’s a simple HTTP(S) wrapper of the git upload-pack command and other backend commands. There are really 2 things I can do:

Implement HTTP smart mode directly with Yesod (must)
Implement git upload-pack using [[!hackage hit]] (cool bonus)

If I do the latter, I’ll also be able to:

Implement a simple git protocol daemon
Implement the SSH server component without using the git binary

Progress

Intro

I spent a while, few weeks, working mostly on the git-upload-pack-over-SSH code. It’s unclear, very technical, boring, not documented precisely, eventually depressing. So I moved back to Vervis, and eventually, earlier today, I added support for git-upload-pack over SSH using the git-upload-pack executable.

Since the HTTP mode is stateless, at least from the server’s point of view, it may be easier to implement the protocol’s steps. Getting results faster also helps maintain motivation, and maybe I’ll even successfully implement the parts missing in my SSH code.

Before I start, what should be the git clone URL? Remember Darcs is going to be supported too. Should the repo page also be used for VCS access? It’s nice for UI, but I’m not sure it’s good for RESTfulness and forward compatibility. Since it’s very early anyway, here’s an idea: For a sharer john and repo foobar, the path /u/john/r/foobar/<VCS-NAME> will be the URL for VCS access. The VCS-NAME part can be git or darcs. So for now, just git.

DECISION: git access base URL is /u/USER/r/REPO/git

Since the smart mode is everywhere these days, I’m going to start with it and ignore the dumb mode entirely.

General

These are general requirements from the git docs. Replace ( ) with (x) gradually.

( ) If there is no repository at `$GIT_URL`, or the resource pointed to by
    a location matching `$GIT_URL` does not exist, the server MUST NOT
    respond with `200 OK` response.  A server SHOULD respond with
    `404 Not Found`, `410 Gone`, or any other suitable HTTP status code
    which does not imply the resource exists as requested.

( ) If there is a repository at `$GIT_URL`, but access is not currently
    permitted, the server MUST respond with the `403 Forbidden` HTTP status
    code.

( ) Servers SHOULD support both HTTP 1.0 and HTTP 1.1.

( ) Servers SHOULD support chunked encoding for both request and response
    bodies.

( ) Servers MAY return ETag and/or Last-Modified headers.

( ) Servers MAY return `304 Not Modified` if the relevant headers appear
    in the request and the entity has not changed.  Clients MUST treat
    `304 Not Modified` identical to `200 OK` by reusing the cached entity.

( ) Clients MAY reuse a cached entity without revalidation if the
    Cache-Control and/or Expires header permits caching.  Clients and
    servers MUST follow RFC 2616 for cache controls.

Ref discovery

(x) The first step, ref discovery, starts by sending a GET to `/info/refs`.
    The result is plain text, so I'm adding a new route that always returns
    404 for now.

(x) The request MUST contain exactly one query parameter,
    `service=$servicename`, where `$servicename` MUST be the service name
    the client wishes to contact to complete the operation. The request
    MUST NOT contain additional query parameters.

    (x) Find out how to get query params in a Yesod handler

        In package `yesod-core`, module `Yesod.Core.Handler`, there's a
        function `getRequest` and also several sugar functions for looking
        up parameters.

(x) The message contains refs much like in SSH, but with an additional
    first line.

    (x) Find out what a peeled ref is and implement in `hit-network`.

        Ah, I think I get it. There are lightweight tags, which just point
        to commits, and annotated tags, which have an author and date and
        their own SHA1 and optional GPG signature. I have both in my repos,
        so I'm playing with them. It seems that:

        * The SHA1 of a lightweight tag points to a *commit* object
        * The SHA1 of an annotated tag points to a *tag* object

        Before I proceed, wild guess. The only meaning of "peeling" I can
        see here is this: When you find an annotated tag, you can read the
        tag object and pick the SHA1 of the commit it refers to. Is that
        what peeling means? Cool idea: I'll try on a repo. In my old `sif`
        repo, I have annotated tags. Running `git-upload-pack` on it
        returns ref discovery as follows (capabilities removed for
        readability):

        00c95dfa29d168487a11e7be741a88129a810927f178 HEAD
        003f5dfa29d168487a11e7be741a88129a810927f178 refs/heads/master
        00485dfa29d168487a11e7be741a88129a810927f178 refs/remotes/origin/master
        003d2a065dac1ed0027405dff41da17ef58f53e2bfdf refs/tags/0.1.0
        004072150789f8e4ab172c5c9a0e81cab62d26b3e287 refs/tags/0.1.0^{}
        003d625f198eefc93151e0a86bd7aa2ea69f2ecd37de refs/tags/0.1.1
        004085a91133a46455378f648d7231c706771864161a refs/tags/0.1.1^{}

        What are the SHA1s of the tags? Let's check:

        * 0.1.0    - tag
        * 0.1.0^{} - commit
        * 0.1.1    - tag
        * 0.1.1^{} - commit

        Indeed, peeling means to fetch the commit pointed by an annotated
        tag. I'm implementing this in `hit-network`. The git source code
        also suggests (unless I missed something) that peeling simply means
        to get the SHA1 the annotated tag points to.

(x) If the server does not recognize the requested service name, or the
    requested service name has been disabled by the server administrator,
    the server MUST respond with the `403 Forbidden` HTTP status code.

(x) Otherwise, smart servers MUST respond with the smart server reply
    format for the requested service name.

(x) Cache-Control headers SHOULD be used to disable caching of the
    returned entity.

(x) The Content-Type MUST be `application/x-$servicename-advertisement`.
    Clients SHOULD fall back to the dumb protocol if another content type
    is returned.  When falling back to the dumb protocol clients SHOULD NOT
    make an additional request to `$GIT_URL/info/refs`, but instead SHOULD
    use the response already in hand.  Clients MUST NOT continue if they do
    not support the dumb protocol.

(x) Clients MUST verify the first pkt-line is `# service=$servicename`.
    Servers MUST set $servicename to be the request parameter value.
    Servers SHOULD include an LF at the end of this line.
    Clients MUST ignore an LF at the end of the line.

(x) Servers MUST terminate the response with the magic `0000` end
    pkt-line marker.

(x) The returned response is a pkt-line stream describing each ref and
    its known value.  The stream SHOULD be sorted by name according to
    the C locale ordering.  The stream SHOULD include the default ref
    named `HEAD` as the first ref.  The stream MUST include capability
    declarations behind a NUL on the first ref.

    See http-protocol.txt for the BNF of the message.

Upload pack

The docs say the request step contains only WANT and HAVE lines, but I’m trying some git commands and it seems the content sent is much like in the SSH case, and actually the same git command, and specifically the same git source C function, handles both cases and it doesn’t even seem to check which case it is.

At least the HTTP transport case seems to work in steps. In each step, the client sends at most 32 HAVE lines. The idea is to keep sending until a common commit is chosen or something like that, must overall after sending 256 HAVEs without confirmation, the client gives up.

I want to try some commands and examine the request.

$ git clone http://dev.rel4tion.org/u/dummy/r/some-repo/git

GET /u/dummy/r/some-repo/git/info/refs
    Params: [("service","git-upload-pack")]
    Accept: */*
POST /u/dummy/r/some-repo/git/git-upload-pack
    Request Body:
        0042want 46cf4f38fd6bf7b9791362680485d2915634c085 agent=git/1.9.1
        0032want 46cf4f38fd6bf7b9791362680485d2915634c085
        0032want 5931f869783fbd0301c8d6384b14a8eb91cbdca2
        0032want 804015721ac0a1bb8863b2add145d6e0533d4ffa
        0032want fa6473b7df0d6376e093f7e4ac4f513c9492d2b8
        0032want 2fcf3ce929a174575c7f88d588f3c36c4ccb24b0
        0032want 3ae3dbcb14fb452c4b5ed62a8408825a6a46bb3a
        0032want 092cebadc64f23f2206a9a9e1655ae964d4a66e8
        00000009done
    Accept: application/x-git-upload-pack-result

So what we have here is:

First want line with capabilities appended
More want lines
Flush-pkt
“done” pkt-line

$ git clone http://dev.rel4tion.org/u/dummy/r/some-repo/git –depth 5

GET /u/dummy/r/some-repo/git/info/refs Params: [(“service”,“git-upload-pack”)] Accept: / POST /u/dummy/r/some-repo/git/git-upload-pack Request Body: 0042want 46cf4f38fd6bf7b9791362680485d2915634c085 agent=git/1.9.1 0032want 46cf4f38fd6bf7b9791362680485d2915634c085 000cdeepen 50000 Accept: application/x-git-upload-pack-result

What we have this time is:

First want with capabilities
Another want, this time the wants refer just to HEAD and master
deepen 5 pkt-line
Flush-pkt

The shallow part is irrelevant because it won’t work until we advertise that we support shallow clients in the capabilities.

Anyway, merely supporting git-clone will already be a great achievement. I want to teach hit-network to handle a request which contains:

1 or more wants, with the first one listing capabilities
Flush-pkt
done

Result

Not done yet.

[See repo JSON]