Re: [PATCH] proc: use #pragma once

From: Rasmus Villemoes
Date: Thu May 03 2018 - 18:15:08 EST


On 2018-05-02 00:13, Andrew Morton wrote:
> On Thu, 26 Apr 2018 22:24:44 +0300 Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
>
>>> The LOC argument also does not sound very convincing.
>>
>> When was the last time you did -80 kLOC patch for free?
>
> That would be the way to do it - sell the idea to Linus, send him a
> script to do it then stand back. The piecemeal approach is ongoing
> pain.
>

FWIW, it's not just removing some identifiers from cpp's hash tables, it
also reduces I/O: Due to our header mess, we have some cyclic includes,
e.g mm.h -> memremap.h -> mm.h. While parsing mm.h, cpp sees the #define
_LINUX_MM_H, then goes parsing memremap.h, but since it hasn't reached
the end of mm.h yet (seeing that there's nothing but comments outside
the #ifndef/#endif pair), it hasn't had a chance to set the internal
flag for mm.h, so it goes slurping in mm.h again. Obviously, the
definedness of _LINUX_MM_H at that point means it "only" has to parse
those 87K for comments and matching up #ifs, #ifdefs,#endifs etc. With
#pragma once, the flag gets set for mm.h immediately, so the #include
from memremap.h is entirely ignored. This can easily be verified with
strace. And mm.h is not the only header getting read twice.

I had some "extract the include guard" line noise lying around, so I
hacked up the below if someone wants to play some more with this. A few
not-very-careful kbuild timings didn't show anything significant, but
both the before and after times were way too noisy, and I only patched
include/linux/*.h.

Anyway, the first order of business is to figure out which ones to leave
alone. We have a bunch of #ifndef THAT_ONE #error "don't include
$this_one directly". The brute-force way is to simply record all macros
which are checked for definedness at least twice.

git grep -h -E '^\s*#\s*if(.*defined\s*\(|n?def)\s*[A-Za-z0-9_]+' | grep
-o -E '[A-Za-z_][A-Za-z_0-9]*' | sort | uniq --repeated > multest.txt

But there's also stuff like arch/x86/boot/compressed/kaslr.c that plays
games with pre-defining _EXPORT_H to avoid parsing export.h when it
inevitably gets included. Oh well, just add the list of macros that have
at least two definitions.

git grep -h -E '^\s*#\s*define\s+[A-Za-z0-9_]+' | grep -o -E
'^\s*#\s*define\s+[A-Za-z0-9_]+' | grep -oE '[A-Za-z0-9_]+' | sort |
uniq --repeated > muldef.txt

With those, one can just do

cat muldef.txt multest.txt | scripts/replace_ig.pl ...

This ends up detecting a lot of copy-pasting (e.g.
__LINUX_MFD_MAX8998_H), as well as lots of headers that for no obvious
reason do not have an include guard. Oh, and once.h has a redundant \.

Rasmus

wear sunglasses...

=== scripts/replace_ig.pl ===

#!/usr/bin/perl

use strict;
use warnings;
use File::Slurp;

my %preserve;

sub strip_comments {
my $txt = shift;

# Line continuations are handled before comment stripping, so
# <slash> <backslash> <newline> <star> actually starts a comment,
# and a // comment can swallow the following line. Let's just
# assume nobody has modified the #if control flow using such dirty
# tricks when we do a more naive line-by-line parsing below to
# actually remove the include guard deffery.
$txt =~ s/\\\n//g;

# http://stackoverflow.com/a/911583/722859
$txt =~ s{
/\* ## Start of /* ... */ comment
[^*]*\*+ ## Non-* followed by 1-or-more *'s
(?:
[^/*][^*]*\*+
)* ## 0-or-more things which don't start with /
## but do end with '*'
/ ## End of /* ... */ comment

|
// ## Start of // comment
[^\n]* ## Anything which is not a newline
(?=\n) ## End of // comment; use look-ahead to avoid consuming the
newline

| ## OR various things which aren't comments:

(
" ## Start of " ... " string
(?:
\\. ## Escaped char
| ## OR
[^"\\] ## Non "\
)*
" ## End of " ... " string

| ## OR

' ## Start of ' ... ' string
(
\\. ## Escaped char
| ## OR
[^'\\] ## Non '\
)*
' ## End of ' ... ' string

| ## OR

. ## Anything other char
[^/"'\\]* ## Chars which doesn't start a comment, string or escape
)
}{defined $1 ? $1 : " "}gxse;

return $txt;
}

sub include_guard {
my $txt = shift;
my @lines = (split /^/, $txt);
my $i = 0;
my $level = 1;
my $name;

# The first non-empty line must be an #ifndef or an #if !defined().
++$i while ($i < @lines && $lines[$i] =~ m/^\s*$/);
goto not_found if ($i == @lines);
goto not_found
if (!($lines[$i] =~
m/^\s*#\s*ifndef\s+(?<name>[A-Za-z_][A-Za-z_0-9]*)\s*$/) &&
!($lines[$i] =~
m/^\s*#\s*if\s+!\s*defined\s*\(\s*(?<name>[A-Za-z_][A-Za-z_0-9]*)\s*\)\s*$/));
$name = $+{name};

# The next non-empty line must be a #define of that macro.
1 while (++$i < @lines && $lines[$i] =~ m/^\s*$/);
goto not_found if ($i == @lines);
goto not_found if !($lines[$i] =~ m/^\s*#\s*define\s+\b$name\b/);

# Now track #ifs and #endifs. #elifs and #elses don't change the level.
while (++$i < @lines && $level > 0) {
if ($lines[$i] =~ m/^\s*#\s*(?:if|ifdef|ifndef)\b/) {
$level++;
} elsif ($lines[$i] =~ m/^\s*#\s*endif\b/) {
$level--;
}
}
goto not_found if ($level > 0); # issue a warning?
# Check that the rest of the file consists of empty lines.
++$i while ($i < @lines && $lines[$i] =~ m/^\s*$/);
goto not_found if ($i < @lines);
return $name;

not_found:
return undef;
}

sub do_file {
my $fn = shift;
my $src = read_file($fn);
my $ig = include_guard(strip_comments($src));
if (not defined $ig) {
printf STDERR "%s: no include guard\n", $fn;
return;
}
if (exists $preserve{$ig}) {
printf STDERR "%s: include guard %s exempted\n", $fn, $ig;
return;
}

# OK, the entire text should match this horrible regexp.
if ($src =~ m{
(.*?) # arbitrary stuff before #ifndef
(^\s*\#\s*if(?:\s*!\s*defined\s*\(\s*$ig\s*\)|ndef\s*$ig) .*? \n #
(?:^\s*\n)*
^\s*\#\s*define\s*$ig .*? \n) # 2/3 of include guard
(.*(?=^\s*\#\s*endif)) # body of file
(^\s*\#\s*endif .*? \n) # last 1/3
(.*) # rest of file (trailing comments)
}smx) {
my $pre = $1;
my $define = $2;
my $body = $3;
my $endif = $4;
my $post = $5;
$body =~ s/\n[ \t]*\n$/\n/g;
$src = $pre . "#pragma once\n";
$src .= $body . $post;
} else {
printf STDERR "%s: has include guard %s, but I failed to replace it
with #pragma once\n",
$fn, $ig;
return;
}
write_file($fn, $src);
}

while (<STDIN>) {
chomp;
$preserve{$_} = 1;
}

for (@ARGV) {
do_file($_);
}