Re: 'the the' typos

Joseph H. Buehler (jhpb@sarto.gaithersburg.md.us)
21 Apr 1998 18:05:19 -0400


Sylvain Pion <Sylvain.Pion@sophia.inria.fr> writes:

> I just noticed there are numerous (44) occurences of 'the the ' in the kernel
> sources. Redundancy in the kernel is not good, so I hope one of the
> maintainers will fix this :)

This appeals to the German in me, so I wrote the following to find all
double words, regardless of the amount of white space, or the type
(newlines no problem). Put this script in a file with execute
permission and run it, passing /usr/src/linux as the argument. The
output is like "grep -n" and can be used in emacs compile mode to step
through all the occurrences that get found. I filtered out some of
the more common but obviously bogus hits.

Joe Buehler

#!/usr/bin/perl
#
# find double words in all text files in a directory tree
#

open(FILES, "find @ARGV -follow -type f |");
while (<FILES>) {
chomp;
$file = $_;
next unless -T $file;
open(IN, "<$file");
{
local($/) = undef;
$data2 = $data = <IN>;
}
close(IN);
while ($data =~ /\b([a-zA-Z]+)\s+\1\b/go) {
$word = $1;
next if $word =~ /^(long|ON|OFF|[^ai]|endif|fi|NULL)$/;
$line = (substr($data2, 0, pos($data)) =~ tr/\n/\n/) + 1;
print "$file:$line:$word\n"
}
}
close(FILES);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu