Re: [REGRESSION] module BTF validation failure (Error -22) on next

From: Jiri Olsa
Date: Tue Dec 17 2024 - 03:03:06 EST


On Mon, Dec 16, 2024 at 03:19:01PM +0000, Alan Maguire wrote:
> On 14/12/2024 12:15, Alan Maguire wrote:
> > On 14/12/2024 04:41, Cong Wang wrote:
> >> On Thu, Dec 05, 2024 at 08:36:33AM +0100, Uros Bizjak wrote:
> >>> On Wed, Dec 4, 2024 at 4:52 PM Laura Nao <laura.nao@xxxxxxxxxxxxx> wrote:
> >>>>
> >>>> On 11/15/24 18:17, Laura Nao wrote:
> >>>>> I managed to reproduce the issue locally and I've uploaded the vmlinux[1]
> >>>>> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one of the
> >>>>> modules[3] and its btf data[4] extracted with:
> >>>>>
> >>>>> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko > cros_kbd_led_backlight.ko.raw
> >>>>>
> >>>>> Looking again at the logs[5], I've noticed the following is reported:
> >>>>>
> >>>>> [ 0.415885] BPF: type_id=115803 offset=177920 size=1152
> >>>>> [ 0.416029] BPF:
> >>>>> [ 0.416083] BPF: Invalid offset
> >>>>> [ 0.416165] BPF:
> >>>>>
> >>>>> There are two different definitions of rcu_data in '.data..percpu', one
> >>>>> is a struct and the other is an integer:
> >>>>>
> >>>>> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data')
> >>>>> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data')
> >>>>>
> >>>>> [115801] VAR 'rcu_data' type_id=115572, linkage=static
> >>>>> [115803] VAR 'rcu_data' type_id=1, linkage=static
> >>>>>
> >>>>> [115572] STRUCT 'rcu_data' size=1152 vlen=69
> >>>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
> >>>>>
> >>>>> I assume that's not expected, correct?
> >>>>>
> >>>>> I'll dig a bit deeper and report back if I can find anything else.
> >>>>
> >>>> I ran a bisection, and it appears the culprit commit is:
> >>>> https://lore.kernel.org/all/20241021080856.48746-2-ubizjak@xxxxxxxxx/
> >>>>
> >>>> Hi Uros, do you have any suggestions or insights on resolving this issue?
> >>>
> >>> There is a stray ";" at the end of the #define, perhaps this makes a difference:
> >>>
> >>> +#define PERCPU_PTR(__p) \
> >>> + (typeof(*(__p)) __force __kernel *)(__p);
> >>> +
> >>>
> >>> and SHIFT_PERCPU_PTR macro now expands to:
> >>>
> >>> RELOC_HIDE((typeof(*(p)) __force __kernel *)(p);, (offset))
> >>>
> >>> A follow-up patch in the series changes PERCPU_PTR macro to:
> >>>
> >>> #define PERCPU_PTR(__p) \
> >>> ({ \
> >>> unsigned long __pcpu_ptr = (__force unsigned long)(__p); \
> >>> (typeof(*(__p)) __force __kernel *)(__pcpu_ptr); \
> >>> })
> >>>
> >>> so this should again correctly cast the value.
> >>
> >> Hm, I saw a similar bug but with pahole 1.28. My kernel complains about
> >> BTF invalid offset:
> >>
> >> [ 7.785788] BPF: type_id=2394 offset=0 size=1
> >> [ 7.786411] BPF:
> >> [ 7.786703] BPF: Invalid offset
> >> [ 7.787119] BPF:
> >>
> >> Dumping the vmlinux (there is no module invovled), I saw it is related to
> >> percpu pointer too:
> >>
> >> [2394] VAR '__pcpu_unique_cpu_hw_events' type_id=2, linkage=global
> >> ...
> >> [163643] DATASEC '.data..percpu' size=2123280 vlen=808
> >> type_id=2393 offset=0 size=1 (VAR '__pcpu_scope_cpu_hw_events')
> >> type_id=2394 offset=0 size=1 (VAR '__pcpu_unique_cpu_hw_events')
> >> ...
> >>
> >> I compiled and installed the latest pahole from its git repo:
> >>
> >> $ pahole --version
> >> v1.28
> >>
> >> Thanks.
> >
> > Thanks for the report! Looking at percpu-defs.h it looks like the
> > existence of such variables requires either
> >
> > #if defined(ARCH_NEEDS_WEAK_PER_CPU) ||
> > defined(CONFIG_DEBUG_FORCE_WEAK_PER_CPU)
> >
> > ...
> >
> > #define DEFINE_PER_CPU_SECTION(type, name, sec) \
> > __PCPU_DUMMY_ATTRS char __pcpu_scope_##name; \
> > extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
> > __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \
> > extern __PCPU_ATTRS(sec) __typeof__(type) name; \
> > __PCPU_ATTRS(sec) __weak __typeof__(type) name
> >
> >
> > I'm guessing your .config has CONFIG_DEBUG_FORCE_WEAK_PER_CPU, or are
> > you building on s390/alpha?
> >
> > I've reproduced this on bpf-next with CONFIG_DEBUG_FORCE_WEAK_PER_CPU=y,
> > pahole v1.28 and gcc-12; I see ~900 __pcpu_ variables and get the same
> > BTF errors since multipe __pcpu_ vars share the offset 0.
> >
> > A simple workaround in dwarves - and I verified this resolved the issue
> > for me - would be
> >
> > diff --git a/btf_encoder.c b/btf_encoder.c
> > index 3754884..4a1799a 100644
> > --- a/btf_encoder.c
> > +++ b/btf_encoder.c
> > @@ -2174,7 +2174,8 @@ static bool filter_variable_name(const char *name)
> > X("__UNIQUE_ID"),
> > X("__tpstrtab_"),
> > X("__exitcall_"),
> > - X("__func_stack_frame_non_standard_")
> > + X("__func_stack_frame_non_standard_"),
> > + X("__pcpu_")
> > #undef X
> > };
> > int i;
> >
> > ...but I'd like us to understand further why variables which were
> > supposed to be in a .discard section end up being encoded as there may
> > be other problems lurking here aside from this one. More soon hopefully...
> >
>
>
> A bit more context here - variable encoding takes the address of the
> variable from DWARF to locate the associated ELF section. Because we
> insist on having a variable specification - with a location - this
> usually works fine. However the problem is that because these dummy
> __pcpu_ variables specify a .discard section, their addresses are 0, so
> we get for example:
>
> <1><1e535>: Abbrev Number: 114 (DW_TAG_variable)
> <1e536> DW_AT_name : (indirect string, offset: 0x5e97):
> __pcpu_unique_kstack_offset
> <1e53a> DW_AT_decl_file : 1
> <1e53b> DW_AT_decl_line : 823
> <1e53d> DW_AT_decl_column : 1
> <1e53e> DW_AT_type : <0x57>
> <1e542> DW_AT_external : 1
> <1e542> DW_AT_declaration : 1
> <1><1e542>: Abbrev Number: 156 (DW_TAG_variable)
> <1e544> DW_AT_specification: <0x1e535>
> <1e548> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
>
>
> You can see the same thing for a simple program like this:
>
> #include <stdio.h>
>
> #define SEC(name) __attribute__((section(name)))
>
> SEC("/DISCARD/") int d1;
> extern int d1;
> SEC("/DISCARD/") int d2;
> extern int d2;
>
> int main(int argc, char *argv[])
> {
> return 0;
> }
>
>
> If you compile it with -g, the DWARF shows that d1 and d2 both have
> address 0:
>
> <1><72>: Abbrev Number: 5 (DW_TAG_variable)
> <73> DW_AT_name : d1
> <76> DW_AT_decl_file : 1
> <77> DW_AT_decl_line : 5
> <78> DW_AT_decl_column : 22
> <79> DW_AT_type : <0x57>
> <7d> DW_AT_external : 1
> <7d> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
> <1><87>: Abbrev Number: 5 (DW_TAG_variable)
> <88> DW_AT_name : d2
> <8b> DW_AT_decl_file : 1
> <8c> DW_AT_decl_line : 7
> <8d> DW_AT_decl_column : 22
> <8e> DW_AT_type : <0x57>
> <92> DW_AT_external : 1
> <92> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0
> (DW_OP_addr: 0)
>
>
> So the reason this happens for dwarves v1.28 in particular is - as I
> understand it - we moved away from recording ELF section information for
> each variable and matching that with DWARF info, instead relying on the
> address to locate the associated ELF section. In cases like the above
> the address information unfortunately leads us astray.
>
> Seems like there's a few approaches we can take in fixing this:
>
> 1. designate "__pcpu_" prefix as a variable prefix to filter out. This
> resolves the immediate problem but is too narrowly focused IMO and we
> may end up playing whack-a-mole with other dummy variable prefixes.
> 2. resurrect ELF section variable information fully; i.e. record a list
> of variables per ELF section (or at least per ELF section we care
> about). If variable is not on the list for the ELF section, do not
> encode it.
> 3. midway between the two; for the 0 address case specifically, verify
> that the variable name really _is_ in the associated ELF section. No
> need to create a local ELF table variable representation, we could just
> walk the table in the case of the 0 addresses.
>
> Diff for approach 3 is as follows
>
> diff --git a/btf_encoder.c b/btf_encoder.c
> index 3754884..21a0ab6 100644
> --- a/btf_encoder.c
> +++ b/btf_encoder.c
> @@ -2189,6 +2189,26 @@ static bool filter_variable_name(const char *name)
> return false;
> }
>
> +bool variable_in_sec(struct btf_encoder *encoder, const char *name,
> size_t shndx)
> +{
> + uint32_t sym_sec_idx;
> + uint32_t core_id;
> + GElf_Sym sym;
> +
> + elf_symtab__for_each_symbol_index(encoder->symtab, core_id, sym,
> sym_sec_idx) {
> + const char *sym_name;
> +
> + if (sym_sec_idx != shndx || elf_sym__type(&sym) !=
> STT_OBJECT)
> + continue;
> + sym_name = elf_sym__name(&sym, encoder->symtab);
> + if (!sym_name)
> + continue;
> + if (strcmp(name, sym_name) == 0)
> + return true;
> + }
> + return false;
> +}
> +
> static int btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
> {
> struct cu *cu = encoder->cu;
> @@ -2258,6 +2278,11 @@ static int
> btf_encoder__encode_cu_variables(struct btf_encoder *encoder)
> if (filter_variable_name(name))
> continue;
>
> + /* A 0 address may be in a .discard section; ensure the
> + * variable really is in this section by checking ELF
> symtab.
> + */
> + if (addr == 0 && !variable_in_sec(encoder, name, shndx))
> + continue;
> /* Check for invalid BTF names */
> if (!btf_name_valid(name)) {
> dump_invalid_symbol("Found invalid variable name
> when encoding btf",
>
>
> ...so slightly more complex than option 1, but a bit more general in its
> applicability to .discard section variables.
>
> For the pahole folks, what do we think? Which option (or indeed other
> ones I haven't thought of) makes sense for a fix for this? Thanks!

I can reproduce this with the CONFIG_DEBUG_FORCE_WEAK_PER_CPU enable,
the fix looks fine, could you send formal patch?

thanks,
jirka