Re: [PATCH] [4.8-rc7, regression] fault_in_multipages_readable() throws set-but-unused error

From: Linus Torvalds
Date: Sun Sep 25 2016 - 23:06:34 EST


On Sun, Sep 25, 2016 at 7:25 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> You have some nasty unicode corruption. The email is marked as being
>>
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>
> It's whatever git-send-email spat out. I was under the impression it
> encodes like that whenever it sees a utf-8 character in a commit...

Yes. But it assumes that the commit text was in UTF-8 too. So I
suspect it happened as you were writing the commit message:

> I turned off utf-8 support in vim on the machine I write all my code
> on

I guess that's probably it. You probably had vim try to convert it to
latin1 or something, and the odd utf-8 tick/back-tick thing doesn't do
so cleanly, so..

>- I got sick of stupid stray marks in my code, digraphs being
> composed when I just want to replace a single character, git sending
> patches in utf-8 encoding because I copied a SoB with a utf-8
> character in the name, etc....

Hmm. Generally, the only place we should have non-us-ascii tensd to be
exactly those names. But a sane editor should "just work" and not do
odd crazy digraph crap or things like that. But I don't use vim, so I
don't know what the magic incantation for sanity there is.

> 7 characters, 12 characters, whatever. Neither make any sense in
> commit messages by themselves without the short description that
> goes along with the hash.....

No, I agree, the right format for describing a commit tends to be
along the lines of

.. commit %h ("%s") ..

exactly like you did. It's just that with 12 characters in the hex
format, people will still be able to cut-and-paste the hash into git
to get the full commit details even a year from now. With just 7 hex
digits, you may well end up in the situation where you do

git show e23d415

and git says

error: short SHA1 e23d415 is ambiguous.
fatal: ambiguous argument 'e23d415': unknown revision or path not
in the working tree.

(it's not ambiguous today, but in a year or two it quite possibly will be).

It's not impossible to figure out the different possible ambiguous
commits, but it's just inconvenient.

Right now, of the roughly 618,000 commits in the mainline tree, about
98% can be uniquely represented with 7 hex digits. But about 2% o
commits need eight or more hex digits to be unique (the git ID space
includes all the tree and blob objects too, so the uniqueness is not
just "unique within commits", but any kernel git object).

Linus