Re: [PATCH v3] checkpatch: improve email parsing

From: Joe Perches
Date: Thu Nov 05 2020 - 12:41:23 EST


(adding stable and Greg KH for additional review)
On Thu, 2020-11-05 at 17:29 +0530, Dwaipayan Ray wrote:
> checkpatch doesn't report warnings for many common mistakes
> in emails. Some of which are trailing commas and incorrect
> use of email comments.

I presume you've tested this against the git tree.

Can you send me a file with the BAD_SIGN_OFF messages generated
and if possible the git SHA-1s of the commits?

> At the same time several false positives are reported due to
> incorrect handling of mail comments. The most common of which
> is due to the pattern:
>
> <stable@xxxxxxxxxxxxxxx> # X.X
>
> Improve email parsing in checkpatch.
>
> Some general comment rules are defined:
>
> - Multiple name comments should not be allowed.
> - Comments inside address should not be allowed.
> - In general comments should be enclosed within parentheses.
>   Exception for stable@xxxxxxxxxxxxxxx # X.X

not just vger.kernel.org, but this should also allow stable@xxxxxxxxxx
and only allow cc: and not any other -by: type for that email address.

A process preference question for Greg and the stable team:

The most common stable forms are

stable@xxxxxxxxxxxxxxx # version info
then
stable@xxxxxxxxxxxxxxx [ version info ]

with some other relatively infrequently used outlier styles, some
that use parentheses, but this is not frequent.

It might be sensible to standardize on the "# version info" trailer
comment version info style and warn on any other form.

A somewhat common style for the stable address is to use a name
before the stable address which describes the version info:

Perhaps any name before stable should be warned and the version
should be a comment.

Here's a list of the stable addresses with "version name" then
stable address in the git tree and other outlier styles.

24 linux-stable <stable@xxxxxxxxxxxxxxx>
21 5.4+ <stable@xxxxxxxxxxxxxxx>
14 All applicable <stable@xxxxxxxxxxxxxxx>
6 3.10+ <stable@xxxxxxxxxxxxxxx>
5 5.9+ <stable@xxxxxxxxxxxxxxx>
5 5.3+ <stable@xxxxxxxxxxxxxxx>
5 5.1+ <stable@xxxxxxxxxxxxxxx>
4 5.6+ <stable@xxxxxxxxxxxxxxx>
4 4.20+ <stable@xxxxxxxxxxxxxxx>
3 Stable Team <stable@xxxxxxxxxxxxxxx>
3 4.19+ <stable@xxxxxxxxxxxxxxx>
3 4.15+ <stable@xxxxxxxxxxxxxxx>
3 4.10+ <stable@xxxxxxxxxxxxxxx>
2 stable@xxxxxxxxxxxxxxx (v2.6.12+)
2 5.2+ <stable@xxxxxxxxxxxxxxx>
2 4.16+ <stable@xxxxxxxxxxxxxxx>
1 v5.8+ <stable@xxxxxxxxxxxxxxx>
1 v5.7+ <stable@xxxxxxxxxxxxxxx>
1 v5.6+ <stable@xxxxxxxxxxxxxxx>
1 v5.3+ <stable@xxxxxxxxxxxxxxx>
1 v5.0+ <stable@xxxxxxxxxxxxxxx>
1 v4.9+ <stable@xxxxxxxxxxxxxxx>
1 <stable@xxxxxxxxxxxxxxx> v5.0+
1 <stable@xxxxxxxxxxxxxxx> +v4.18
1 stable@xxxxxxxxxxxxxxx (3.14+)
1 5.8+ <stable@xxxxxxxxxxxxxxx>
1 5.5+ <stable@xxxxxxxxxxxxxxx>
1 5.0+ <stable@xxxxxxxxxxxxxxx>
1 4.18+ <stable@xxxxxxxxxxxxxxx>
1 4.14+ <stable@xxxxxxxxxxxxxxx>
1 4.13+ <stable@xxxxxxxxxxxxxxx>
1 4.0+ <stable@xxxxxxxxxxxxxxx>
1 3.15+ <stable@xxxxxxxxxxxxxxx>
1 3.11+ <stable@xxxxxxxxxxxxxxx>

> Improvements to parsing:
>
> - Detect and report unexpected content after email.
> - Quoted names are excluded from comment parsing.
> - Trailing dots or commas in email are removed during
>   formatting. Correspondingly a BAD_SIGN_OFF warning
>   is emitted.
> - Improperly quoted email like '"name <address>"' are now
>   warned about.

All of the above seems right but perhaps the comment style for any
<foo>-by: lines should also allow # comments.

The below is just comments on the patch itself.

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -2800,9 +2806,57 @@ sub process {
>   $dequoted =~ s/" </ </;
>   # Don't force email to have quotes
>   # Allow just an angle bracketed address
> - if (!same_email_addresses($email, $suggested_email, 0)) {
> + if (!same_email_addresses($email, $suggested_email)) {
> + if (WARN("BAD_SIGN_OFF",
> + "email address '$email' might be better as '$suggested_email'\n" . $herecurr) &&
> + $fix) {

trivia:

Please always align $fix with tabs to the if and then 4 spaces to the
open parenthesis.

> + # Comments must begin only with (
> + # or # in case of stable@xxxxxxxxxxxxxxx
> + if ($email =~ /^.*stable\@vger/) {

I believe this should be

if ($email =~ /^stable\@(?:vger\.)?kernel.org$/) {

> + if ($comment ne "" && $comment !~ /^#.+/) {
> + if (WARN("BAD_SIGN_OFF",
> + "Invalid comment format for stable: '$email', prefer parentheses\n" . $herecurr) &&

Prefer #

> + $fix) {
> + my $new_comment = $comment;
> + $new_comment =~ s/^[ \(\[]+|[ \)\]]+$//g;

Does the comment include any leading whitespace here?
I presumed not given the $comment !~ /^#/ test above.