Re: get_maintainers.pl subsystem output

From: Duda, Sebastian
Date: Fri Jul 19 2019 - 05:54:38 EST


On 2019-07-19 08:50, Joe Perches wrote:
On Fri, 2019-07-19 at 07:35 +0000, Duda, Sebastian wrote:
Hi Joe,

I'm conducting a large-scale patch analysis of the LKML with 1.8 million
patch emails. I'm using the `get_maintainer.pl` script to know which
patch is related to which subsystem.

The MAINTAINERS file is updated frequently.

Are you also using the MAINTAINERS file used
at the time each patch was submitted?

Yes, for each patch we use the MAINTAINERS file from the current (by the time the patch was submitted) release (candidate).

I ran into two issues while using the script:

1. When I use the script the trivial way

$ scripts/get_maintainer.pl --subsystem --status --separator ,
drivers/media/i2c/adv748x/
Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx> (maintainer:ANALOG
DEVICES INC ADV748X DRIVER),Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>
(maintainer:MEDIA INPUT INFRASTRUCTURE
(V4L/DVB)),linux-media@xxxxxxxxxxxxxxx (open list:ANALOG DEVICES INC
ADV748X DRIVER),linux-kernel@xxxxxxxxxxxxxxx (open list)
Maintained,Buried alive in reporters
ANALOG DEVICES INC ADV748X DRIVER,MEDIA INPUT INFRASTRUCTURE
(V4L/DVB),THE REST

the output is hard to parse because the status `Maintained` is displayed
only once but related to two subsystems.

I'd prefer a more table like representation, like this:

Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx> (maintainer:ANALOG
DEVICES INC ADV748X DRIVER),linux-media@xxxxxxxxxxxxxxx (open
list:ANALOG DEVICES INC ADV748X DRIVER),ANALOG DEVICES INC ADV748X
DRIVER,Maintained
Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> (maintainer:MEDIA INPUT
INFRASTRUCTURE (V4L/DVB)),MEDIA INPUT INFRASTRUCTURE
(V4L/DVB),Maintained
linux-kernel@xxxxxxxxxxxxxxx (open list),THE REST,Buried alive in
reporters


2. I want to analyze multiple patches, currently I am calling the script
once per patch. When calling the script with multiple files the files
output is merged

$ scripts/get_maintainer.pl --subsystem --status --separator ','
drivers/media/i2c/adv748x/ include/uapi/linux/wmi.h
Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx> (maintainer:ANALOG
DEVICES INC ADV748X DRIVER),Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>
(maintainer:MEDIA INPUT INFRASTRUCTURE
(V4L/DVB)),linux-media@xxxxxxxxxxxxxxx (open list:ANALOG DEVICES INC
ADV748X DRIVER),linux-kernel@xxxxxxxxxxxxxxx (open
list),platform-driver-x86@xxxxxxxxxxxxxxx (open list:ACPI WMI DRIVER)
Maintained,Buried alive in reporters,Orphan
ANALOG DEVICES INC ADV748X DRIVER,MEDIA INPUT INFRASTRUCTURE
(V4L/DVB),THE REST,ACPI WMI DRIVER

I'd like to run the script with all files but separated output, like
this:

$ scripts/get_maintainer.pl --subsystem --status --separator ','
--separate-files drivers/media/i2c/adv748x/ include/uapi/linux/wmi.h
Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx> (maintainer:ANALOG
DEVICES INC ADV748X DRIVER),Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>
(maintainer:MEDIA INPUT INFRASTRUCTURE
(V4L/DVB)),linux-media@xxxxxxxxxxxxxxx (open list:ANALOG DEVICES INC
ADV748X DRIVER),linux-kernel@xxxxxxxxxxxxxxx (open list)
Maintained,Buried alive in reporters
ANALOG DEVICES INC ADV748X DRIVER,MEDIA INPUT INFRASTRUCTURE
(V4L/DVB),THE REST

platform-driver-x86@xxxxxxxxxxxxxxx (open list:ACPI WMI
DRIVER),linux-kernel@xxxxxxxxxxxxxxx (open list)
Orphan,Buried alive in reporters
ACPI WMI DRIVER,THE REST


My Questions are:
1. How can I make get_maintainer's output to be more table-like?

I suggest adding --nogit --nogit-fallback --roles --norolestats

Unfortunately, this doesn't change the output:
$ scripts/get_maintainer.pl --subsystem --status --separator , drivers/media/i2c/adv748x/
Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx> (maintainer:ANALOG DEVICES INC ADV748X DRIVER),Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB)),linux-media@xxxxxxxxxxxxxxx (open list:ANALOG DEVICES INC ADV748X DRIVER),linux-kernel@xxxxxxxxxxxxxxx (open list)
Maintained,Buried alive in reporters
ANALOG DEVICES INC ADV748X DRIVER,MEDIA INPUT INFRASTRUCTURE (V4L/DVB),THE REST

$ scripts/get_maintainer.pl --subsystem --status --separator , --nogit --nogit-fallback --roles --norolestats drivers/media/i2c/adv748x/
Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx> (maintainer:ANALOG DEVICES INC ADV748X DRIVER),Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB)),linux-media@xxxxxxxxxxxxxxx (open list:ANALOG DEVICES INC ADV748X DRIVER),linux-kernel@xxxxxxxxxxxxxxx (open list)
Maintained,Buried alive in reporters
ANALOG DEVICES INC ADV748X DRIVER,MEDIA INPUT INFRASTRUCTURE (V4L/DVB),THE REST

2. How can I make get_maintainer.pl to separate each file's output?

Run the script with multiple invocations. once for each file
modified by the patch.

This is the way I'm doing it right now but this is very slow. I thought calling the script only once for many files could speed up the analysis.

Thank you
Sebastian