Re: [REGRESSION] module BTF validation failure (Error -22) on

From: Jiri Olsa
Date: Thu Dec 12 2024 - 04:25:02 EST


On Wed, Dec 11, 2024 at 10:10:24PM +0100, Jiri Olsa wrote:
> On Tue, Dec 10, 2024 at 02:55:01PM +0100, Laura Nao wrote:
> > Hi Jiri,
> >
> > Thanks for the feedback!
> >
> > On 12/6/24 13:35, Jiri Olsa wrote:
> > > On Fri, Nov 15, 2024 at 06:17:12PM +0100, Laura Nao wrote:
> > >> On 11/13/24 10:37, Laura Nao wrote:
> > >>>
> > >>> Currently, KernelCI only retains the bzImage, not the vmlinux
> > >>> binary. The
> > >>> bzImage can be downloaded from the same link mentioned above by
> > >>> selecting
> > >>> 'kernel' from the dropdown menu (modules can also be downloaded the
> > >>> same
> > >>> way). I’ll try to replicate the build on my end and share the
> > >>> vmlinux
> > >>> with DWARF data stripped for convenience.
> > >>>
> > >>
> > >> I managed to reproduce the issue locally and I've uploaded the
> > >> vmlinux[1]
> > >> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one of
> > >> the
> > >> modules[3] and its btf data[4] extracted with:
> > >>
> > >> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko >
> > >> cros_kbd_led_backlight.ko.raw
> > >>
> > >> Looking again at the logs[5], I've noticed the following is reported:
> > >>
> > >> [ 0.415885] BPF: type_id=115803 offset=177920 size=1152
> > >> [ 0.416029] BPF:
> > >> [ 0.416083] BPF: Invalid offset
> > >> [ 0.416165] BPF:
> > >>
> > >> There are two different definitions of rcu_data in '.data..percpu',
> > >> one
> > >> is a struct and the other is an integer:
> > >>
> > >> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data')
> > >> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data')
> > >>
> > >> [115801] VAR 'rcu_data' type_id=115572, linkage=static
> > >> [115803] VAR 'rcu_data' type_id=1, linkage=static
> > >>
> > >> [115572] STRUCT 'rcu_data' size=1152 vlen=69
> > >> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
> > >> encoding=(none)
> > >>
> > >> I assume that's not expected, correct?
> > >
> > > yes, that seems wrong.. but I can't reproduce with your config
> > > together with pahole 1.24 .. could you try with latest one?
> >
> > I just tested next-20241210 with the latest pahole version (1.28 from
> > the master branch[1]), and the issue does not occur with this version
> > (I can see only one instance of rcu_data in the BTF data, as expected).
> >
> > I can confirm that the same kernel revision still exhibits the issue
> > with pahole 1.24.
> >
> > If helpful, I can also test versions between 1.24 and 1.28 to identify
> > which ones work.
>
> I managed to reproduce finally with gcc-12, but had to use pahole 1.25,
> 1.24 failed with unknown attribute
>
> [95096] VAR 'rcu_data' type_id=94868, linkage=static
> [95098] VAR 'rcu_data' type_id=4, linkage=static
> type_id=95096 offset=177088 size=1152 (VAR 'rcu_data')
> type_id=95098 offset=177088 size=1152 (VAR 'rcu_data')

so for me the difference seems to be using gcc-12 and this commit in linux tree:
dabddd687c9e percpu: cast percpu pointer in PERCPU_PTR() via unsigned long

which adds extra __pcpu_ptr variable into dwarf, and it has the same
address as the per cpu variable and that confuses pahole

it ends up with adding per cpu variable twice.. one with real type
(type_id=94868) and the other with unsigned long type (type_id=4)

however this got fixed in pahole 1.28 commit:
47dcb534e253 btf_encoder: Stop indexing symbols for VARs

which filters out __pcpu_ptr variable completely, adding Stephen to the loop

with gcc-14 the __pcpu_ptr variable has VSCOPE_OPTIMIZED scope, so it won't
get into btf even without above pahole fix

I suggest gcc/pahole upgrade ;-)

thanks,
jirka