Re: [RFC] MAINTAINERS tag for cleanup robot

From: Joe Perches
Date: Sun Nov 22 2020 - 13:25:02 EST


On Sun, 2020-11-22 at 08:33 -0800, Tom Rix wrote:
> On 11/21/20 9:10 AM, Joe Perches wrote:
> > On Sat, 2020-11-21 at 08:50 -0800, trix@xxxxxxxxxx wrote:
> > > A difficult part of automating commits is composing the subsystem
> > > preamble in the commit log. For the ongoing effort of a fixer producing
> > > one or two fixes a release the use of 'treewide:' does not seem appropriate.
> > >
> > > It would be better if the normal prefix was used. Unfortunately normal is
> > > not consistent across the tree.
> > >
> > > So I am looking for comments for adding a new tag to the MAINTAINERS file
> > >
> > > D: Commit subsystem prefix
> > >
> > > ex/ for FPGA DFL DRIVERS
> > >
> > > D: fpga: dfl:
> > I'm all for it. Good luck with the effort. It's not completely trivial.
> >
> > From a decade ago:
> >
> > https://lore.kernel.org/lkml/1289919077.28741.50.camel@Joe-Laptop/
> >
> > (and that thread started with extra semicolon patches too)
>
> Reading the history, how about this.
>
> get_maintainer.pl outputs a single prefix, if multiple files have the
> same prefix it works, if they don't its an error.
>
> Another script 'commit_one_file.sh' does the call to get_mainainter.pl
> to get the prefix and be called by run-clang-tools.py to get the fixer
> specific message.

It's not whether the script used is get_maintainer or any other script,
the question is really if the MAINTAINERS file is the appropriate place
to store per-subsystem patch specific prefixes.

It is.

Then the question should be how are the forms described and what is the
inheritance priority. My preference would be to have a default of
inherit the parent base and add basename(subsystem dirname).

Commit history seems to have standardized on using colons as the separator
between the commit prefix and the subject.

A good mechanism to explore how various subsystems have uses prefixes in
the past might be something like:

$ git log --no-merges --pretty='%s' -<commit_count> <subsystem_path> | \
perl -n -e 'print substr($_, 0, rindex($_, ":") + 1) . "\n";' | \
sort | uniq -c | sort -rn

Using 10000 for commit_count and drivers/scsi for subsystem_path, the
top 40 entries are below:

About 1% don't have a colon, and there is no real consistency even
within individual drivers below scsi. For instance, qla2xxx:

1 814 scsi: qla2xxx:
2 691 scsi: lpfc:
3 389 scsi: hisi_sas:
4 354 scsi: ufs:
5 339 scsi:
6 291 qla2xxx:
7 256 scsi: megaraid_sas:
8 249 scsi: mpt3sas:
9 200 hpsa:
10 190 scsi: aacraid:
11 174 lpfc:
12 153 scsi: qedf:
13 144 scsi: smartpqi:
14 139 scsi: cxlflash:
15 122 scsi: core:
16 110 [SCSI] qla2xxx:
17 108 ncr5380:
18 98 scsi: hpsa:
19 97
20 89 treewide:
21 88 mpt3sas:
22 86 scsi: libfc:
23 85 scsi: qedi:
24 84 scsi: be2iscsi:
25 81 [SCSI] qla4xxx:
26 81 hisi_sas:
27 81 block:
28 75 megaraid_sas:
29 71 scsi: sd:
30 69 [SCSI] hpsa:
31 68 cxlflash:
32 65 scsi: libsas:
33 65 scsi: fnic:
34 61 scsi: scsi_debug:
35 60 scsi: arcmsr:
36 57 be2iscsi:
37 53 atp870u:
38 51 scsi: bfa:
39 50 scsi: storvsc:
40 48 sd: