Re: [PATCH 0/9] powerpc: delete duplicated words

From: Randy Dunlap
Date: Sun Jul 26 2020 - 15:08:19 EST


On 7/26/20 10:49 AM, Joe Perches wrote:
> On Sun, 2020-07-26 at 10:23 -0700, Randy Dunlap wrote:
>> On 7/26/20 7:29 AM, Christophe Leroy wrote:
>>> Randy Dunlap <rdunlap@xxxxxxxxxxxxx> a Ãcrit :
>>>
>>>> Drop duplicated words in arch/powerpc/ header files.
>>>
>>> How did you detect them ? Do you have some script for tgat, or you just read all comments ?
>>
>> Yes, it's a script that finds lots of false positives, so I have to check
>> each and every one of them for validity.
>
> And it's a lot of work too. (thanks Randy)
>
> It could be something like:
>
> $ grep-2.5.4 -nrP --include=*.[ch] '\b([A-Z]?[a-z]{2,}\b)[ \t]*(?:\n[ \t]*\*[ \t]*|)\1\b' * | \
> grep -vP '\b(?:struct|enum|union)\s+([A-Z]?[a-z]{2,})\s+\*?\s*\1\b' | \
> grep -vP '\blong\s+long\b' | \
> grep -vP '\b([A-Z]?[a-z]{2,})(?:\t+| {2,})\1\b'

Hi Joe,

(what is grep-2.5.4 ?)

It looks like you tried a few iterations of this -- since it drops things
like "long long". There are lots of data types that are repeated & valid.
And many struct names, like "struct kref kref", "struct completion completion",
and "struct mutex mutex". I handle (ignore) those manually, although that
could be added to the Perl script.

v0.1 of this script also found lots of repeated numbers and strings of
special characters (ASCII art etc.), so now it ignores duplicated numbers
or special characters -- since it is really looking for duplicate words.

Anyway, I might as well attach it. It's no big deal.
And if someone else wants to tackle using it, go for it.

--
~Randy

Attachment: find_dup_words.pl
Description: Perl program