Re: email as a bona fide git transport

From: Vegard Nossum
Date: Thu Oct 17 2019 - 09:08:31 EST


On 10/16/19 10:57 PM, Jonathan Nieder wrote:
Hi,

A few small points.

Vegard Nossum wrote:

* git am (or an alternative command) needs to recreate the commit
perfectly when applied, including applying it to the correct parent

Interesting. "git format-patch" has a --base option to do some of
what you're looking for, for the sake of snowpatch
<https://github.com/ruscur/snowpatch>. Though it's not exactly the
same thing you mean.

Yes, --base is great for importing email patches into git, but does not
allow resulting commits to have the same sha1 (because it doesn't have
the complete author/committer information, mainly), so it doesn't do
enough for what I want with this proposal.

We also discussed sending merge commits by mail recently in the
virtual git committer summit[1].

Of course, the devil is in the details. It's straightforward to use
"git bundle" to use mail as a Git transport today, but presumably you
also want the ability to perform reviews along the way and that's not
so easy with a binary format. Do you have more details on what you'd
want the format to look like, particularly for merge commits?

Yes, one of the goals is to continue to use git-send-email and be able
to view and review patches on a mailing list with minimal intrusion to
existing processes.

I did not envision supporting merge commits where the resulting tree is
different from all of its parents, which means any merge commits run
through 'git-format-patch --complete' would just have multiple "parent"
lines and no diff where the diff would normally be.

is no need for "changeset IDs" or whatever, since you can just use the
git SHA1 which is unique, unambiguous, and stable.

In [2] the hope was for some identifier that is preserved by "git
rebase" and "git commit --amend" (so that you can track the evolution
of a change as the author improves it in response to reviews). Is
that the conversation you're alluding to?

Yes.

(Not the specific email/thread you linked, but yes, this is the general
conversation about having stable identifiers I'm referring to.)


[...]
Disadvantages:

- requires patching git

That's not a disadvantage. It means get to work with the Git project,
which is a welcoming bunch of people, working on userspace (seeing how
the other half lives), and improving the lives of everyone using Git.

True! :-)

Date: Sat, 5 Oct 2019 16:15:59 +0200
Subject: [PATCH 1/3] format-patch: add --complete

Include the raw commit data between the changelog and the diffstat.

Oh! I had missed this on first reading because it was in an
attachment.

I have mixed feelings. Can you say a bit more about the advantages
and disadvantages relative to sending a git bundle? What happens if a
mail client or a box along the way mangles whitespace in the commit
message?

Yes, as we both said above: git bundle is not human readable and does
not lend itself to reviews in mail clients or on mailing lists.

Any kind of mangling is a serious concern, and my thoughts are:

- The _diff_ itself should already be safe from mangling. If this were
not the case, then sending patches by email would be completely unsafe
and would need to be fixed somehow anyway.

- I think I remember seeing something in either 'git am' or 'git
mailinfo' about format=flowed apparently allowing whitespace to
disappear from the line endings.

- Isolated problems like that could be preemptively fixed (or at least
warned about) on the submitter side by e.g. warning about potential
issues when committing or running format-patch/send-email)

- If some mail software is susceptible to mangling patches, then I
think it would great to have a mechanism for detecting that -- which
"git-format-patch --complete" would be! :-)

- It is true that the commit message itself (and perhaps particularly
people's names) is especially vulnerable to mangling and/or reencoding
(I noticed that 'git am' seems to convert to utf8 before committing). I
honestly don't know if this would be a problem in practice, as I don't
know too much about mail software and encodings. If most people use
format-patch/send-email to send patches, then maybe it's not actually a
problem because everything is either kept as utf-8 to start with or
encoded/decoded consistently across the git userbase?

- I was thinking of hex-encoding the raw bytes of the author and
committer lines of the extra commit metadata in the final email to keep
everything as ASCII and avoid character set conversions. It is true that
this does not stop mangling of the changelog...

- I suspect that if mangling was a problem in practice, then we would
have seen it already and the solution to it is not specific to what I'm
proposing here.

I'd love for somebody who is more knowledgeable about email and encoding
to chime in here. It is definitely one of the biggest potential problems
and it would be good to approach it in a way that doesn't require lots
of red tape after it has already been implemented.


Vegard


Happy hacking,
Jonathan

[1] https://public-inbox.org/git/nycvar.QRO.7.76.6.1909261253400.15067@xxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/ksummit-discuss/CAD=FV=UPjPpUyFTPjF-Ogzj_6LJLE4PTxMhCoCEDmH1LXSSmpQ@xxxxxxxxxxxxxx/