Re: [PATCH v2] checkpatch: improve email parsing

From: Lukas Bulwahn
Date: Tue Nov 03 2020 - 03:11:39 EST


On Tue, Nov 3, 2020 at 8:28 AM Joe Perches <joe@xxxxxxxxxxx> wrote:
>
> On Tue, 2020-11-03 at 11:28 +0530, Dwaipayan Ray wrote:
> > On Tue, Nov 3, 2020 at 11:18 AM Dwaipayan Ray <dwaipayanray1@xxxxxxxxx> wrote:
> > >
> > > checkpatch doesn't report warnings for many common mistakes
> > > in emails. Some of which are trailing commas and incorrect
> > > use of email comments.
> > >
> > > At the same time several false positives are reported due to
> > > incorrect handling of mail comments. The most common of which
> > > is due to the pattern:
> > >
> > > <stable@xxxxxxxxxxxxxxx> # X.X
> > >
> > > Improve email parsing mechanism in checkpatch.
> > >
> > > What is added:
> > >
> > > - Support for multiple name/address comments.
> > > - Improved handling of quoted names.
> > > - Sanitize improperly formatted comments.
> > > - Sanitize trailing semicolon or dot after email.
> []
> > What do you think? Should warnings for the names which should
> > be quoted be reported considering this result?
>
> Clearly the quote suggestion is unnecessary.
>
> I think that "cc: stable@(?:vger\.)?kernel\.org" should be
> treated differently from other forms of invalid/odd address lines.
>
> My suggestion is that the case insensitive form of
>
> Cc: stable@xxxxxxxxxxxxxxx
>
> or only another similar case insensitive forms with a
> # comment separator like
>
> Cc: <stable@xxxxxxxxxxxxxxx> # some comment
>
> be acceptable for stable.
>
> All other forms with stable@ should emit some message.
>

I agree that handling stable@xxxxxxxxxxxxxxx should be a special case.

We can even ask Greg KH and Sasha if they have certain preferences for
the format of this meta information after the #, so that their scripts
could pick this up.

> And other <foo>-by: and cc: addresses should only have a form like
>
> Signed-off-by: "Full.Name" (possible comment) <email@xxxxxxxxxx>
> or
> Signed-off-by: Full Name (possible comment) <email@xxxxxxxxxx>
>
> etc..
>
> and any additional content after .tld in the email address be flagged
> with some message like "unexpected content after email address" rather
> than "might be better as".
>

I agree with refining the error message here. Also, Aditya, Dwaipayan,
here we can probably have some suitable fix methods, e.g., detect
where the parsing fails (a missing ">" or a space where the should not
be one, or a just few characters at the end, or a long list of email
addresses which should be split etc.

Maybe you can coordinate among each other who would want to create
suitable fix rules here?

Also, start with the class of the most frequent mistakes for
unexpected content after email addresses.

I imagine that a maintainer can simply run a tag sanitizing script
which just cleans up those stupid mistakes before creating their git
trees or sending git pulls to Linus. Let us try to add these
sanitizing rules to checkpatch.pl with fix options for now; if that
sanitizing feature becomes a monster script of its own within
checkpatch.pl, we can refactor that into an independent script for
cleaning up.

Lukas