Re: [PATCH v3] checkkconfigsymbols.sh: reimplementation in python

From: Valentin Rothberg
Date: Mon Sep 22 2014 - 03:43:41 EST


On dim., 2014-09-21 at 23:28 +0200, Paul Bolle wrote:
> Valentin Rothberg schreef op zo 21-09-2014 om 21:53 [+0200]:
> > The scripts/checkkconfigsymbols.sh script searches Kconfig features
> > in the source code that are not defined in Kconfig. Such identifiers
> > always evaluate to false and are the source of various kinds of bugs.
> > However, the shell script is slow and it does not detect such broken
> > references in Kbuild and Kconfig files (e.g., ``depends on UNDEFINEDÂÂ).
> > Furthermore, it generates false positives (4 of 526 in v3.17-rc1).
>
> Curiosity: what are those four false positives?

1) /arch/cris/kernel/module.c: ETRAX_KMALLOCED_MODULES (defined in
arch/cris/Kconfig)

2) ./lib/Makefile: TEST_MODULE (defined in lib/Kconfig.debug)

3,4) ./include/linux/module.h, ./kernel/module.c: DEBUG_SET_MODULE_RONX
(defined in arch/{s390,arm,x86}/Kconfig.debug)

> > The script is also hard to read and understand, and is thereby difficult
> > to maintain.
> >
> > This patch replaces the shell script with an implementation in Python,
> > which:
> > (a) detects the same bugs, but does not report false positives
>
> Depends a bit on the definition of false positives. Ie, the hit for
> ./arch/sh/kernel/head_64.S: CACHE_
>
> is caused by
> #error preprocessor flag CONFIG_CACHE_... not recognized!
>
> Should that line, and similar lines, be changed?

I consider a false positive to actually be defined in Kconfig. The
feature in your example does not really apply to the naming convention
of Kconfig features ("..."), so that our regex does not match it.

However, the regex matches "CONFIG_X86_". I shall change the regex to
not accept strings ending with "_", so that such cases are not reported.

> > (b) additionally detects broken references in Kconfig and Kbuild files
> > (c) is up to 75 times faster than the shell script
> >
> > The new script reduces the runtime on my machine (i7-2620M, 8GB RAM, SSD)
> > from 3m47s to 0m3s, and reports 570 broken references in Linux v3.17-rc1;
> > 49 additional reports of which 16 are located in Kconfig files.
> >
> > Moreover, we intentionally include references in C comments, which have been
> > ignored until now. Such comments may be leftovers of features that have
> > been removed or renamed in Kconfig (e.g., ``#endif /* CONFIG_MPC52xx */ÂÂ).
> > These references can be misleading and should be removed or replaced.
>
> Yes, that's a good idea.
>
> > Signed-off-by: Valentin Rothberg <valentinrothberg@xxxxxxxxx>
> > Signed-off-by: Stefan Hengelein <stefan.hengelein@xxxxxx>
> > ---
> > Changelog:
> > v2: Fix of regural expressions
> > v3: Changelog replacement, and add changes of v2
> > ---
> > scripts/checkkconfigsymbols.py | 131 +++++++++++++++++++++++++++++++++++++++++
> > scripts/checkkconfigsymbols.sh | 59 -------------------
> > 2 files changed, 131 insertions(+), 59 deletions(-)
> > create mode 100644 scripts/checkkconfigsymbols.py
> > delete mode 100755 scripts/checkkconfigsymbols.sh
> >
> > diff --git a/scripts/checkkconfigsymbols.py b/scripts/checkkconfigsymbols.py
> > new file mode 100644
> > index 0000000..5c51dd1
> > --- /dev/null
> > +++ b/scripts/checkkconfigsymbols.py
> > @@ -0,0 +1,131 @@
> > +#!/usr/bin/env python
> > +
> > +"""Find Kconfig identifieres that are referenced but not defined."""
> > +
> > +# Copyright (C) 2014 Valentin Rothberg <valentinrothberg@xxxxxxxxx>
> > +# Copyright (C) 2014 Stefan Hengelein <stefan.hengelein@xxxxxx>
> > +#
> > +# This program is free software; you can redistribute it and/or modify it
> > +# under the terms and conditions of the GNU General Public License,
> > +# version 2, as published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope it will be useful, but WITHOUT
> > +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> > +# more details.
> > +
> > +
> > +import os
> > +import re
> > +
> > +
> > +# REGEX EXPRESSIONS
> > +OPERATORS = r"&|\(|\)|\||\!"
> > +FEATURE = r"\w*[A-Z]{1}\w*"
> > +FEATURE_DEF = r"^\s*(menu){,1}config\s+" + FEATURE + r"\s*"
> > +EXPR = r"(" + OPERATORS + r"|\s|" + FEATURE + r")*"
> > +STMT = r"^\s*(if|select|depends\s+on)\s+" + EXPR
>
> "depends on" with multiple spaces?
> > +
> > +# REGEX OBJECTS
> > +REGEX_FILE_KCONFIG = re.compile(r"Kconfig[\.\w+\-]*$")
> > +REGEX_FILE_SOURCE = re.compile(r"\.[cSh]$")
> > +REGEX_FILE_MAKE = re.compile(r"Makefile|Kbuild[\.\w+]*$")
> > +REGEX_FEATURE = re.compile(r"(" + FEATURE + r")")
> > +REGEX_FEATURE_DEF = re.compile(FEATURE_DEF)
> > +REGEX_CPP_FEATURE = re.compile(r"\W+CONFIG_(" + FEATURE + r")[.]*")
>
> There are a few uses of "-DCONFIG_[...]" in Makefiles. This will miss
> those, won't it? That's not bad, per se, but a comment why you're
> skipping those might be nice. Or are those caught too, somewhere else?

I was not aware of such uses, thanks. It seems important to cover them
too. Does this prefix has a certain purpose?


> > +REGEX_KCONFIG_EXPR = re.compile(EXPR)
> > +REGEX_KCONFIG_STMT = re.compile(STMT)
> > +REGEX_KCONFIG_HELP = re.compile(r"^[\s|-]*help[\s|-]*")
>
> Won't "^\s\+(---help---|help)$" do? Might help catch creative variants
> of the help statement (we had a few in the past).

Yes, your regex is more robust. Thanks!

> > +
> > +
> > +def main():
> > + """Main function of this module."""
> > + output = []
> > + source_files = []
> > + kconfig_files = []
> > + defined_features = set()
> > + referenced_features = dict()
> > +
> > + for root, _, files in os.walk("."):
>
> Does this descend into the .git directory?

Yes, it does. I will exclude the .git repository in the next version of
this patch.

> > + for fil in files:
> > + if REGEX_FILE_KCONFIG.match(fil):
> > + kconfig_files.append(os.path.join(root, fil))
> > + elif REGEX_FILE_SOURCE.search(fil) or REGEX_FILE_MAKE.match(fil):
> > + source_files.append(os.path.join(root, fil))
> > +
> > + for kfile in kconfig_files:
> > + parse_kconfig_file(kfile, defined_features, referenced_features)
> > +
> > + for sfile in source_files:
> > + parse_source_file(sfile, referenced_features)
> > +
> > + print "File list\tundefined symbol used"
> > + for feature in referenced_features:
> > + if feature not in defined_features:
> > + files = referenced_features.get(feature)
> > + output.append("%s:\t%s" % (", ".join(files), feature))
> > + for out in sorted(output):
> > + print out
> > +
> > +
> > +def parse_source_file(sfile, referenced_features):
> > + """Parse @sfile for referenced Kconfig features."""
> > + lines = []
> > + with open(sfile, "r") as stream:
> > + lines = stream.readlines()
> > +
> > + for line in lines:
> > + if not "CONFIG_" in line:
> > + continue
> > + features = REGEX_CPP_FEATURE.findall(line)
> > + for feature in features:
> > + if feature.endswith("_MODULE"):
> > + feature = feature[:-len("_MODULE")]
>
> Uses of CONFIG_FOO_MODULE are now reported as uses of CONFIG_FOO. That
> might be confusing.

OK. I can leave the suffix and only remove it to check if the feature is
defined. Then, the output should be more precise and less confusing.

> > + paths = referenced_features.get(feature, set())
> > + paths.add(sfile)
> > + referenced_features[feature] = paths
> > +
> > +
> > +def get_features_in_line(line):
> > + """Return mentioned kconfig features in @line."""
> > + return REGEX_FEATURE.findall(line)
> > +
> > +
> > +def parse_kconfig_file(kfile, defined_features, referenced_features):
> > + """Parse @kfile and update feature definitions and references."""
> > + lines = []
> > + skip = False
> > +
> > + with open(kfile, "r") as stream:
> > + lines = stream.readlines()
> > +
> > + for i in range(len(lines)):
> > + line = lines[i]
> > + line = line.strip('\n')
> > + line = line.split("#")[0] # Ignore right side of comments
> > +
> > + if REGEX_FEATURE_DEF.match(line):
> > + feature_def = REGEX_FEATURE.findall(line)
> > + defined_features.add(feature_def[0])
> > + skip = False
> > + elif REGEX_KCONFIG_HELP.match(line):
> > + skip = True
> > + elif skip:
> > + # Ignore content of help messages
> > + pass
> > + elif REGEX_KCONFIG_STMT.match(line):
> > + features = get_features_in_line(line)
> > + # Multi-line statements
> > + while line.endswith("\\"):
> > + i += 1
> > + line = lines[i]
> > + line = line.strip('\n')
> > + features.extend(get_features_in_line(line))
> > + for feature in set(features):
> > + paths = referenced_features.get(feature, set())
> > + paths.add(kfile)
> > + referenced_features[feature] = paths
> > +
> > +
> > +if __name__ == "__main__":
> > + main()
> > diff --git a/scripts/checkkconfigsymbols.sh b/scripts/checkkconfigsymbols.sh
> > deleted file mode 100755
> > index ccb3391..0000000
> > --- a/scripts/checkkconfigsymbols.sh
> > +++ /dev/null
> >[...]
>
> Suggestion for future work: make this git aware. Ie, make things like
> python scripts/checkkconfigsymbols.py v3.17-rc5
> python scripts/checkkconfigsymbols.py next-20140919
> python scripts/checkkconfigsymbols.py $SHA1SUM
>
> produce useful output.

I am happy about your idea, as I desire to do exactly that. In fact, I
am developing a tool that is checking Git commits for such symbolic
violations and logic violations (i.e., when the precondition of an
#ifdef block is contradictory). However, this tool cannot be integrated
into the kernel for various reasons.

Thank you for giving feedback.

Valentin Rothberg

>
>
> Paul Bolle
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/