Re: [PATCH] scripts/setlocalversion: make git describe output more reliable

From: Masahiro Yamada
Date: Wed Sep 16 2020 - 11:55:03 EST


On Fri, Sep 11, 2020 at 5:28 PM Rasmus Villemoes
<linux@xxxxxxxxxxxxxxxxxx> wrote:
>
> On 10/09/2020 21.05, Brian Norris wrote:
> > On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> >> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes
> >> <linux@xxxxxxxxxxxxxxxxxx> wrote:
> >>> So in order to avoid `uname -a` output relying on such random details
> >>> of the build environment which are rather hard to ensure are
> >>> consistent between developers and buildbots, use an explicit
> >>> --abbrev=15 option (and for consistency, also use rev-parse --short=15
> >>> for the unlikely case of no signed tags being usable).
> >
> > For the patch:
> >
> > Reviewed-by: Brian Norris <briannorris@xxxxxxxxxxxx>
> >
> >> I agree that any randomness should be avoided.
> >>
> >> My question is, do we need 15-digits?
> > ...
> >> So, I think the conflict happens
> >> only when we have two commits that start with the same 7-digits
> >> in the _same_ release. Is this correct?
>
> No.
>
> > For git-describe (the case where we have a tag to base off):
> > "use <n> digits, or as many digits as needed to form a unique object name"
>
> Yes, the abbreviated hash that `git describe` produces is unique among
> all objects (and objects are more than just commits) in the current
> repo, so what matters for probability-of-collision is the total number
> of objects - linus.git itself probably grows ~60000 objects per release.
>
> As for "do we need 15 digits", well, in theory the next time I merge the
> next rt-stable tag into our kernel I could end up with a commit that
> matches some existing object in the first 33 hex chars at which point I
> should have chosen 34 - but of course that's so unlikely that it's
> irrelevant.
>
> But the upshot of that is that there really is no objective answer to
> "how many digits do we need", so it becomes a tradeoff between "low
> enough probability that anyone anywhere in the next few years would have
> needed more than X when building their own kernel" and readability of
> `uname -r` etc. So I decided somewhat arbitrarily that each time one
> rolls a new release, there should be less than 1e-9 probability that
> HEAD collides with some other object when abbreviated to X hexchars.
> X=12 doesn't pass that criteria even when the repo has only 10M objects
> (and, current linus.git already has objects that need 12 to be unique,
> so such collisions do happen in practice, though of course very rarely).
> 13 and 14 are just weird numbers, so I ended with 15, which also allows
> many many more objects in the repo before the probability crosses that
> 1e-9 threshold.
>
> Rasmus
>
> Side note 1: Note that there really isn't any such thing as "last
> tag/previous tag" in a DAG; I think what git does is a best-effort
> search for the eligible tag that minimizes #({objects reachable from
> commit-to-be-described} - {objects reachable from tag}), where eligible
> tag means at least reachable from commit-to-be-described (so that set
> difference is a proper one), but there can be additional constraints
> (e.g. --match=...).
>
> Side note 2: Linus or Greg releases are actually not interesting here
> (see the logic in setlocalversion that stops early if we're exactly at
> an annotated tag) - the whole raison d'etre for setlocalversion is that
> people do maintain and build non-mainline kernels, and it's extremely
> useful for `uname -a` to accurately tell just which commit is running on
> the target.



This is because you use the output from git as-is.

Why are you still trying to rely on such obscure behavior of git?


It is pretty easy to get the fixed number of digits reliably.

For example,
git rev-parse --verify HEAD 2>/dev/null | cut -c1-7


It always returns a 7-digits hash,
and two different people will get the same result for sure.

Your solution, --short=15, means "at least 15",
which still contains ambiguity.



As I already noted, the kernelrelease string is
constructed in this format:

<kernel-version>-<number-of-commits-on-top>-<abbreviated-commit-hash>


For example, if I enable CONFIG_LOCALVERSION_AUTO=y
in today's Linus tree, I got this:

5.9.0-rc5-00005-gfc4f28bb3daf


What if the number of digits were 7?

5.9.0-rc5-00005-gfc4f28b

I see no problem here.

Even if we have another object that starts with "fc4f28b",
the kernelrelease string is still unique thanks to the
<kernel-version>-<number-of-commits-on-top> prefix.

Why do we care about the uniqueness of the abbreviated
hash in the whole git history?



Note:
We prepend $(KERNELVERSION) to the result
of setlocalversion. [1]

[1] https://github.com/torvalds/linux/blob/v5.9-rc4/Makefile#L1175


--
Best Regards
Masahiro Yamada