Re: Please turn "Cannot use CONFIG_STACK_VALIDATION" into build error

From: Jessica Yu
Date: Mon Feb 13 2017 - 18:08:57 EST

+++ Josh Poimboeuf [13/02/17 12:41 -0600]:
On Mon, Feb 13, 2017 at 12:07:09AM -0800, Marc MERLIN wrote:
Hi Josh,

I'll start with the story as to why.
i've lost more hours than I care to list, because I was unable to build
the virtualbox kernel driver with newer kernels.
Sadly, it gives no useful debug info outside of
make[1]: *** No rule to make target '/tmp/vbox.0/linux/SUPDrv-linux.o', needed by '/tmp/vbox.0/vboxdrv.o'. Stop.

It took some pretty deep debugging to finally see this:
Trying rule prerequisite 'tools/objtool/objtool'.
Looking for a rule with intermediate file 'tools/objtool/objtool'.
Avoiding implicit rule recursion.
which look quite inoccuous and don't look as errors at all.
When I filed a bug with the vbox folks, they were unable to find out why
the module refused to build on my kernel, and I was stuck with older
kernels as a result.

Then, I had another module, bbswitch, to turn off the nvidia chip on my
laptop to save battery. That one also failed to build with newer
kernels, but thankfully made it more clear that the problem was related
to tools/objtool/objtool missing.

But why was it missing? No idea...
I trace that down to CONFIG_STACK_VALIDATION which there seems to be no
menu option for, so I manually disable it in .config, rebuild, and it's
automatically re-enabled. Gah.

Hm, that doesn't sound right. Nothing automatically enables
CONFIG_STACK_VALIDATION. It should be disabled unless manually enabled.
Maybe you got it confused with CONFIG_HAVE_STACK_VALIDATION, which is
always enabled?

BTW, there is a config option for it in the menu:

Kernel hacking
Compile-time checks and compiler options
Compile-time stack metadata validation

More hair pulling, and finally I make a typo
saruman:/usr/src/linux-block# make xonfig
Makefile:1044: "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, libelf-devel or elfutils-libelf-devel"
scripts/kconfig/conf --silentoldconfig Kconfig
Makefile:1044: "Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, libelf-devel or elfutils-libelf-devel"
make: *** No rule to make target 'xonfig'. Stop.

Sure enough, this was my problem, but I never saw the error message
because I build kernels with
make-kpkg --revision 1gandalf kernel-image
which does other stuff and hid that warning, which really should have
been a fatal error in my opinion.

Given that
1) CONFIG_STACK_VALIDATION seems silently auto enabled.
2) without libelf-dev, the kernel will build but will leave a tree
missing objtool, which in turn causes (all?) 3rd party modules to fail

Yes, this is a bug.

3) and that it's kind of non trivial to find out why if that happens,

Would you consider making
"Cannot use CONFIG_STACK_VALIDATION, please install libelf-dev, libelf-devel or elfutils-libelf-devel"
a build error as opposed to a warning?
This sure would have saved me countless errors of debugging the wrong

Correct me if I'm wrong, but it sounds like make-kpkg suppressed stderr?
If so, that should be fixed.

When I try to build an OOT module with CONFIG_STACK_VALIDATION enabled
and elfutils-libelf-devel missing (on Fedora), I get:

make: Entering directory '/home/jpoimboe/git/linux'
make[1]: Entering directory '/home/jpoimboe/ktest/output'
CC [M] /home/jpoimboe/livepatch-test/1/livepatch2.o
/bin/sh: ./tools/objtool/objtool: No such file or directory
/home/jpoimboe/git/linux/scripts/ recipe for target '/home/jpoimboe/livepatch-test/1/livepatch2.o' failed
make[2]: *** [/home/jpoimboe/livepatch-test/1/livepatch2.o] Error 1
/home/jpoimboe/git/linux/Makefile:1490: recipe for target '_module_/home/jpoimboe/livepatch-test/1' failed
make[1]: *** [_module_/home/jpoimboe/livepatch-test/1] Error 2
make[1]: Leaving directory '/home/jpoimboe/ktest/output'
Makefile:150: recipe for target 'sub-make' failed
make: *** [sub-make] Error 2
make: Leaving directory '/home/jpoimboe/git/linux'

It's not a perfect error message, but the

'/bin/sh: ./tools/objtool/objtool: No such file or directory'

is at least a big clue. I'm curious why you didn't see that.

Anyway, the above libelf-dev warning is just a warning and not a build
error because CONFIG_STACK_VALIDATION is enabled for allyesconfig, and
it's not a severe enough problem to warrant breaking the build.

Ideally the same warning should be printed when building OOT modules.
I'll try to figure out if there's a way to do that it.

Btw, it looks like the libelf warning is inside an `ifeq ($(KBUILD_EXTMOD),)`
block, so it does not get applied to OOT modules. It would be
possible to add the warning to the corresponding KBUILD_EXTMOD (else)
block or somewhere else in the Makefile common to both cases (probably
more preferable). I tried the latter case and the warning prints for
all cases (vmlinux, intree, external modules).