Re: [PATCH v2] checkpatch: extend author Signed-off-by check for split From: header

From: Lukas Bulwahn
Date: Mon Sep 21 2020 - 03:39:43 EST




On Sun, 20 Sep 2020, Joe Perches wrote:

> On Sun, 2020-09-20 at 21:52 +0530, Dwaipayan Ray wrote:
> > On Sun, Sep 20, 2020 at 8:39 PM Joe Perches <joe@xxxxxxxxxxx> wrote:
> > > On Sun, 2020-09-20 at 14:47 +0530, Dwaipayan Ray wrote:
> > > > Checkpatch did not handle cases where the author From: header
> > > > was split into multiple lines. The author identity could not
> > > > be resolved and checkpatch generated a false NO_AUTHOR_SIGN_OFF
> > > > warning.
> > >
> > I think there won't be any problem. Is my
> > observation correct?
>
> Likely true, it probably doesn't matter.
> Still, I'd imagine it doesn't hurt either.
>
> > > What I have does a bit more by saving any post-folding
> > >
> > > "From: <name and email address>"
> > >
> > > and comparing that to any "name and perhaps different
> > > email address" in a Signed-off-by: line.
> > >
> > > A new message is emitted if the name matches but the
> > > email address is different.
> > >
> > > Perhaps it's reasonable to apply your patch and then
> > > update it with something like the below:
> > > ---
> > > scripts/checkpatch.pl | 32 ++++++++++++++++++++++++++++----
> > > 1 file changed, 28 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > index 3e474072aa90..1ecc179e938d 100755
> > > --- a/scripts/checkpatch.pl
> > > +++ b/scripts/checkpatch.pl
> > > @@ -1240,6 +1240,15 @@ sub same_email_addresses {
> > > $email1_address eq $email2_address;
> > > }
> > >
> > > +sub same_email_names {
> > > + my ($email1, $email2) = @_;
> > > +
> > > + my ($email1_name, $name1_comment, $email1_address, $comment1) = parse_email($email1);
> > > + my ($email2_name, $name2_comment, $email2_address, $comment2) = parse_email($email2);
> > > +
> > > + return $email1_name eq $email2_name;
> > > +}
> > > +
> > > sub which {
> > > my ($bin) = @_;
> > >
> > > @@ -2679,20 +2688,32 @@ sub process {
> > > }
> > >
> > > # Check the patch for a From:
> > > - if (decode("MIME-Header", $line) =~ /^From:\s*(.*)/) {
> > > + if ($line =~ /^From:\s*(.*)/i) {
> > > $author = $1;
> > > - $author = encode("utf8", $author) if ($line =~ /=\?utf-8\?/i);
> > > + my $curline = $linenr;
> > > + while (defined($rawlines[$curline]) && $rawlines[$curline++] =~ /^\s(\s+)?(.*)/) {
> > > + $author .= ' ' if (defined($1));
> > > + $author .= "$2";
> > > + }
> > > + if ($author =~ /=\?utf-8\?/i) {
> > > + $author = decode("MIME-Header", $author);
> > > + $author = encode("utf8", $author);
> > > + }
> > > +
> > > $author =~ s/"//g;
> > > $author = reformat_email($author);
> > > }
> > >
> > > # Check the patch for a signoff:
> > > if ($line =~ /^\s*signed-off-by:\s*(.*)/i) {
> > > + my $sig = $1;
> > > $signoff++;
> > > $in_commit_log = 0;
> > > if ($author ne '') {
> > > - if (same_email_addresses($1, $author)) {
> > > - $authorsignoff = 1;
> > > + if (same_email_addresses($sig, $author)) {
> > > + $authorsignoff = "1";
> > > + } elsif (same_email_names($sig, $author)) {
> > > + $authorsignoff = $sig;
> > > }
> > > }
> > > }
> > > @@ -6937,6 +6958,9 @@ sub process {
> > > } elsif (!$authorsignoff) {
> > > WARN("NO_AUTHOR_SIGN_OFF",
> > > "Missing Signed-off-by: line by nominal patch author '$author'\n");
> > > + } elsif ($authorsignoff ne "1") {
> > > + WARN("NO_AUTHOR_SIGN_OFF",
> > > + "From:/SoB: email address mismatch: 'From: $author' != 'Signed-off-by: $authorsignoff'\n");
> > > }
> > > }
> > >
> > >
> >
> > Yes, this is definitely more logical !
> > I was actually hoping to talk with you on this.
>
> Hope granted, but only via email... (smile)
>
> > The code you sent better handles name mismatches when
> > email addresses are same. But I also have found several
> > such commits in which the author have signed off using
> > a different email address than the one which he/she used
> > to send the patch.
> >
> > For example, Lukas checked commits between v5.4 and
> > v5.8 and he found:
> > 175 Missing Signed-off-by: line by nominal patch authorrep
> > 'Daniel Vetter <daniel.vetter@xxxxxxxx>'
> >
> > Infact in all of those commits he signed off using a different
> > mail, Daniel Vetter <daniel.vetter@xxxxxxxxx>.
> >
> > So is it possible to resolve these using perhaps .mailmap
> > entries? Or should only the name mismatch part be better
> > handled? Or perhaps both?
>
> Dunno. It certainly can be improved...
> Try adding some more logic and see what you come up with.
>
> btw:
>
> The most frequent NO_AUTHOR_SIGN_OFF messages for v5.7..v5.8 are
>
> 98 WARNING: From:/SoB: email address mismatch: 'From: Daniel Vetter <daniel.vetter@xxxxxxxx>' != 'Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>'
> 24 WARNING: From:/SoB: email address mismatch: 'From: Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx>' != 'Signed-off-by: Thinh Nguyen <thinhn@xxxxxxxxxxxx>'
> 19 WARNING: From:/SoB: email address mismatch: 'From: Wolfram Sang <wsa+renesas@xxxxxxxxxxxxxxxxxxxx>' != 'Signed-off-by: Wolfram Sang <wsa@xxxxxxxxxx>'
> 11 WARNING: From:/SoB: email address mismatch: 'From: Luke Nelson <lukenels@xxxxxxxxxxxxxxxxx>' != 'Signed-off-by: Luke Nelson <luke.r.nels@xxxxxxxxx>'
> 8 WARNING: From:/SoB: email address mismatch: 'From: Christophe Leroy <christophe.leroy@xxxxxx>' != 'Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx>'
> 6 WARNING: From:/SoB: email address mismatch: 'From: Davidlohr Bueso <dave@xxxxxxxxxxxx>' != 'Signed-off-by: Davidlohr Bueso <dbueso@xxxxxxx>'
> 5 WARNING: Missing Signed-off-by: line by nominal patch author '"Paul A. Clarke" <pc@xxxxxxxxxx>'
> 4 WARNING: Missing Signed-off-by: line by nominal patch author 'Huang Ying <ying.huang@xxxxxxxxx>'
> 3 WARNING: Missing Signed-off-by: line by nominal patch author '"Stephan Müller" <smueller@xxxxxxxxxx>'
>

Great minds think alike.

I did a similar investigation on Friday after the first discussion with
Dwaipayan:

https://lore.kernel.org/linux-kernel-mentees/alpine.DEB.2.21.2009181238230.14717@felia/T/#m1bf5f7ca876d33d4d53e492b5d8a6232437c921f

I hope Dwaipayan can come up with a '.AUTHOR_SIGN_OFF.mailmap' file that
we can use to distinguish the known developers that knowingly and
intentionally use different identities vs. the 'newbies' that should
validly be warned.

We will see if Dwaipayan can come up with a good idea how to handle that.

Lukas

> For the Missing Signed-off-by: lines above,
> even after decoding, the email matches but
> the names do not.
>
> From: "Paul A. Clarke" <pc@xxxxxxxxxx>
> [...]
> Signed-off-by: Paul Clarke <pc@xxxxxxxxxx>
>
> From: Huang Ying <ying.huang@xxxxxxxxx>
> [...]
> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
>
> From: =?UTF-8?q?Stephan=20M=C3=BCller?= <smueller@xxxxxxxxxx>
> [...]
> Signed-off-by: Stephan Mueller <smueller@xxxxxxxxxx>
>
> > Also, I would like to know if there are any more changes
> > required for the current patch or if it is good to go?
>
> I think it's fine.
>
> cheers, Joe
>
>