Re: scripts/get_maintainer.pl misses maintainers sometimes

From: Joe Perches
Date: Thu Jul 13 2017 - 18:23:01 EST


On Wed, 2017-07-12 at 09:48 -0700, Joe Perches wrote:
> On Wed, 2017-07-12 at 18:36 +0200, Maarten Lankhorst wrote:
> > Hello Joe Perches,
> >
> > I created a script for drm's maintainer-tools that pipes the output of get_maintainer.pl
> > to add the appropriate cc's to the commit message. It also ignores duplicates, so running
> > the script twice on the same commit doesn't add everyone twice.
> >
> > When testing, I found out that sometimes maintainers were added on the second patch, and
> > that made me notice a bug in get_maintainer.pl.
> >
> > On the below patch, I get up to 39 maintainers from get_maintainer.pl, but the actual amount
> > differs randomly on each invocation:
> >
> > ~/linux$ git show | scripts/get_maintainer.pl | wc -l
> > 39
> > ~/linux$ git show | scripts/get_maintainer.pl | wc -l
> > 37
> > ~/linux$ git show | scripts/get_maintainer.pl | wc -l
> > 38
> >
> > Any idea why this happens?
>
> If you add --nogit --nogit-fallback to the command line
> the output is consistent.
>
> So it seems it comes down to that what git log outputs
> for the same command varies.
>
> I'll track it down further later.

It doesn't actually seem like a bug so much as
unreproducibility of the script output.

It seems to come down to this line in the script in
subroutine vcs_assign:

    foreach my $line (sort {$hash{$b} <=> $hash{$a}} keys %hash) {

where there is a hash of sorted commit signers and this is
selecting the ones with the most signatures.

When there are multiple signers with the same number of commits,
perl is picking a random entry.

I think it's not particularly important to be reproducible as
the MAINTAINERS entries should be prioritized and the git
commit signers should only be considered when there is no
listed maintainer.

If you have a bright idea as to how perl should order the
entries in the hash for reproducibility, let me know.

cheers, Joe