Re: [PATCH v6 3/4] scripts: add verifier script for builtin module range data

From: Masahiro Yamada
Date: Sun Aug 18 2024 - 02:41:24 EST

On Fri, Aug 16, 2024 at 12:04 AM Kris Van Hees <kris.van.hees@xxxxxxxxxx> wrote:
> The modules.builtin.ranges offset range data for builtin modules is
> generated at compile time based on the list of built-in modules and
> the and linker maps. This data can be used
> to determine whether a symbol at a particular address belongs to
> module code that was configured to be compiled into the kernel proper
> as a built-in module (rather than as a standalone module).
> This patch adds a script that uses the generated modules.builtin.ranges
> data to annotate the symbols in the with module names if
> their address falls within a range that belongs to one or more built-in
> modules.
> It then processes the (and if needed, to
> verify the annotation:
> - For each top-level section:
> - For each object in the section:
> - Determine whether the object is part of a built-in module
> (using modules.builtin and the .*.cmd file used to compile
> the object as suggested in [0])
> - For each symbol in that object, verify that the built-in
> module association (or lack thereof) matches the annotation
> given to the symbol.
> Signed-off-by: Kris Van Hees <kris.van.hees@xxxxxxxxxx>
> Reviewed-by: Nick Alcock <nick.alcock@xxxxxxxxxx>
> Reviewed-by: Alan Maguire <alan.maguire@xxxxxxxxxx>
> ---
> Changes since v5:
> - Added optional 6th argument to specify kernel build directory.
> - Report error and exit if .*.o.cmd files cannot be read.
> Changes since v4:
> - New patch in the series
> ---
> scripts/verify_builtin_ranges.awk | 365 ++++++++++++++++++++++++++++++
> 1 file changed, 365 insertions(+)
> create mode 100755 scripts/verify_builtin_ranges.awk
> diff --git a/scripts/verify_builtin_ranges.awk b/scripts/verify_builtin_ranges.awk
> new file mode 100755
> index 000000000000..b82cf0a0fbeb
> --- /dev/null
> +++ b/scripts/verify_builtin_ranges.awk
> @@ -0,0 +1,365 @@
> +#!/usr/bin/gawk -f
> +# SPDX-License-Identifier: GPL-2.0
> +# verify_builtin_ranges.awk: Verify address range data for builtin modules
> +# Written by Kris Van Hees <kris.van.hees@xxxxxxxxxx>
> +#
> +# Usage: verify_builtin_ranges.awk modules.builtin.ranges \
> +# modules.builtin \
> +# [ <build-dir> ]
> +#
> +
> +# Return the module name(s) (if any) associated with the given object.
> +#
> +# If we have seen this object before, return information from the cache.
> +# Otherwise, retrieve it from the corresponding .cmd file.
> +#
> +function get_module_info(fn, mod, obj, mfn, s) {
> + if (fn in omod)
> + return omod[fn];
> +
> + if (match(fn, /\/[^/]+$/) == 0)
> + return "";
> +
> + obj = fn;
> + mod = "";
> + mfn = "";
> + fn = kdir "/" substr(fn, 1, RSTART) "." substr(fn, RSTART + 1) ".cmd";
> + if (getline s <fn == 1) {
> + if (match(s, /DKBUILD_MODFILE=['"]+[^'"]+/) > 0) {
> + mfn = substr(s, RSTART + 16, RLENGTH - 16);
> + gsub(/['"]/, "", mfn);
> +
> + mod = mfn;
> + gsub(/([^/ ]*\/)+/, "", mod);
> + gsub(/-/, "_", mod);
> + }
> + } else {
> + print "ERROR: Failed to read: " fn "\n\n" \
> + " Invalid kernel build directory (" kdir ")\n" \
> + " or its content does not match " ARGV[1] >"/dev/stderr";
> + close(fn);
> + total = 0;
> + exit(1);
> + }
> + close(fn);
> +
> + # A single module (common case) also reflects objects that are not part
> + # of a module. Some of those objects have names that are also a module
> + # name (e.g. core). We check the associated module file name, and if
> + # they do not match, the object is not part of a module.
> + if (mod !~ / /) {
> + if (!(mod in mods))
> + return "";
> + if (mods[mod] != mfn)
> + return "";
> + }
> +
> + # At this point, mod is a single (valid) module name, or a list of
> + # module names (that do not need validation).
> + omod[obj] = mod;
> + close(fn);
> +
> + return mod;
> +}

This code is copy-paste from scripts/generate_builtin_ranges.awk
So, my comments in 2/4 can apply to this patch, too.

Instead of adding a separate script,
we could add a "verify mode" option.

scripts/generate_builtin_ranges.awk --verify ...

But, I do not know how much cleaner it will become.

I am not good at reviewing AWK code, but this
is how you go.

If this script were written in Python,
it would be easy and readable to
split logically-related code chunks into functions,
as follows:

def parse_module_builtin():

def parse_vmlinux_map_lld():

def parse_vmlinux_map_bfd():

def parse_vmlinux_o_map():

Best Regards
Masahiro Yamada