Re: [PATCH] sched/tracing: append prev_state to tp args instead

From: Alexei Starovoitov
Date: Wed Apr 27 2022 - 16:33:25 EST

On Wed, Apr 27, 2022 at 11:17 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > See my reply to Peter. libbpf can't know user's intent to fail this
> > > automatically, in general. In some cases when it can it does
> > > accommodate this automatically. In other cases it provides instruments
> > > for user to handle this (bpf_core_field_size(),
> >
> > My naiive thinking is that the function signature has changed (there's 1 extra
> > arg not just a subtle swap of args of the same type) - so I thought that can be
> > detected. But maybe it is harder said than done.
> It is. We don't have number of arguments either:
> struct bpf_raw_tracepoint_args {
> __u64 args[0];
> };
> What BPF program is getting is just an array of u64s.

Well, that's a true and false statement at the same time
that might confuse folks reading this thread.
To clarify:
raw_tp and tp_btf programs receive an array of u64-s.
raw_tp checks the number of arguments only.
The prog that accesses non-existing arg will be rejected.
tp_btf prog in addition to the number of args knows
the exact type of the arguments.
So in this case accessing arg0 in bpf prog
as 'struct task_struct *' while it's 'unsigned int prev_state'
in the kernel will cause the verifier to reject it.
If prog does bpf_probe_read_kernel() then all bets are off, of course.
There is no type checking mechanism for bpf_probe_read_kernel.
Only BTF powered pointer dereferences are type checked.
The 'tp_btf' prog type is the recommended way to access tracepoints.
The 'raw_tp' was implemented before BTF was introduced.