"Inconsistent kallsyms data" error

From: Linus Torvalds
Date: Thu Jul 05 2012 - 17:18:45 EST


So for some unknown reason I'm hitting this on just one particular
machine, and it's *very* annoying.

It's annoying for three reasons:

- it's breaking the build (duh)

- the error is printed out to stderr, so you don't even *see* it as
an error if you redirect the normal messages somewhere else (like any
sane person, ie me, does)

- when the error happens, it doesn't show *what* went wrong, and in
fact it explicitly cleans up all the files that could show what
happened.

And no, "make KALLSYMS_EXTRA_PASS=1" does not fix anything.

Interestingly, making a trivial change to actually show the difference
actually made the problem go away. It was entirely reliable with that
particular config and that particular kernel version with a *clean*
tree, but it looks like just changing the tree to be dirty (and thus
changing the version string) hides the problem. Which makes it even
harder to debug, because now I can't see what the difference actually
is that causes things to fail.

VERY annoying.

This is not a new bug - according to google this has been reported
before, back in October 2011. In that case the workaround worked. In
my case it does not.

Anyway, after hacking the source to actually show the difference, and
to also *not* change the version string just becuse it's dirty, I see
this difference:

- System.map:

...
ffffffff8189b4d0 R kallsyms_addresses
ffffffff818ee910 R kallsyms_num_syms
ffffffff818ee918 R kallsyms_names
...
ffffffff819fa9a0 R __stop___modver
ffffffff819fb000 R __end_rodata
...

- .tmp_System.map:

...
ffffffff8189b4d0 R kallsyms_addresses
ffffffff818ee850 R kallsyms_num_syms
ffffffff818ee858 R kallsyms_names
...
ffffffff819fa720 R __stop___modver
ffffffff819fb000 R __end_rodata

(the diff itself is huge, because once the addresses change, they stay
different).

Notice how 'kallsyms_addresses' has the same value, but
'kallsyms_num_syms' (and subsequent symbols until the page-aligned
__end_rodata symbol that gets them back in sync) do not. I have no
idea *why* this happens, but it definitely does.

It seems the real difference is the size of the "kallsyms_addresses"
data structure. No idea why, though.

This happens with current git (commit c4aed353b1b0), on an x86-64
machine running current F17 as of today, with the attached config.
Maybe that makes somebody else able to recreate this and figure out
what is so magical about the layout that the exact kernel version and
config (and likely compiler/binutils versions) matter.

Any ideas? Added a fairly random set of people who get mentioned in
the linker script commits etc.

Linus

Attachment: tove-config.gz
Description: GNU Zip compressed data