Re: [PATCH] checkpatch: fix false positive for REPEATED_WORD warning
From: Lukas Bulwahn
Date: Wed Oct 21 2020 - 11:08:56 EST
On Wed, 21 Oct 2020, Aditya Srivastava wrote:
> Presence of hexadecimal address or symbol results in false warning
> message by checkpatch.pl.
>
> For example, running checkpatch on commit b8ad540dd4e4 ("mptcp: fix
> memory leak in mptcp_subflow_create_socket()") results in warning:
>
> WARNING:REPEATED_WORD: Possible repeated word: 'ff'
> 00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff ........./0.....
>
> Here, it reports 'ff' to be repeated, but it is in fact part of some
> address or code, where it has to be repeated.
> In this case, the intent of the warning to find stylistic issues in
> commit messages is not met and the warning is just completely wrong in
> this case.
>
> To avoid all such reports, add an additional regex check for a repeating
> pattern of 4 or more 2-lettered words separated by space in a line.
>
> A quick evaluation on v5.6..v5.8 showed that this fix reduces
> REPEATED_WORD warnings from 2797 to 1043.
>
> A quick manual check found all cases are related to hex output in
> commit messages.
>
Aditya, one thing I just noticed the commit message header is a bit
uninformative.
How about something like:
identify typical hex output for a better REPEATED_WORD check
Other than that, it looks good. You might want to share the link to the
complete report of differences before and after this patch for Joe to
check as well.
Lukas
> Signed-off-by: Aditya Srivastava <yashsri421@xxxxxxxxx>
> ---
> scripts/checkpatch.pl | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 9b9ffd876e8a..78aeb7a3ca3d 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -3050,8 +3050,10 @@ sub process {
> }
> }
>
> -# check for repeated words separated by a single space
> - if ($rawline =~ /^\+/ || $in_commit_log) {
> +# check for repeated words separated by a single space and
> +# avoid repeating hex occurrences like 'ff ff fe 09 ...'
> + if (($rawline =~ /^\+/ || $in_commit_log) &&
> + $rawline !~ /(\b[0-9a-f]{2}( )+){4,}/) {
> while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>
> my $first = $1;
> --
> 2.17.1
>
>