Re: [PATCH v3 net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.

From: Sebastian Andrzej Siewior
Date: Fri May 31 2024 - 06:38:30 EST


On 2024-05-30 00:09:21 [+0200], Toke Høiland-Jørgensen wrote:
> [...]
> > @@ -240,12 +240,14 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames,
> > int xdp_n, struct xdp_cpumap_stats *stats,
> > struct list_head *list)
> > {
> > + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
> > int nframes;
>
> I think we need to zero-initialise all the context objects we allocate
> on the stack.
>
> The reason being that an XDP program can return XDP_REDIRECT without
> calling any of the redirect helpers first; which will lead to
> xdp_do_redirect() being called without any of the fields in struct
> bpf_redirect_info having being set. This can lead to a crash if the
> values happen to be the wrong value; and if we're not initialising the
> stack space used by this struct, we have no guarantees about what value
> they will end up with.

Okay, I can do that.

> > void bpf_clear_redirect_map(struct bpf_map *map)
> > {
> > - struct bpf_redirect_info *ri;
> > - int cpu;
> > -
> > - for_each_possible_cpu(cpu) {
> > - ri = per_cpu_ptr(&bpf_redirect_info, cpu);
> > - /* Avoid polluting remote cacheline due to writes if
> > - * not needed. Once we pass this test, we need the
> > - * cmpxchg() to make sure it hasn't been changed in
> > - * the meantime by remote CPU.
> > - */
> > - if (unlikely(READ_ONCE(ri->map) == map))
> > - cmpxchg(&ri->map, map, NULL);
> > - }
> > + /* ri->map is assigned in __bpf_xdp_redirect_map() from within a eBPF
> > + * program/ during NAPI callback. It is used during
> > + * xdp_do_generic_redirect_map()/ __xdp_do_redirect_frame() from the
> > + * redirect callback afterwards. ri->map is cleared after usage.
> > + * The path has no explicit RCU read section but the local_bh_disable()
> > + * is also a RCU read section which makes the complete softirq callback
> > + * RCU protected. This in turn makes ri->map RCU protected and it is
> > + * sufficient to wait a grace period to ensure that no "ri->map == map"
> > + * exists. dev_map_free() removes the map from the list and then
> > + * invokes synchronize_rcu() after calling this function.
> > + */
> > }
>
> With the zeroing of the stack variable mentioned above, I agree that
> this is not needed anymore, but I think we should just get rid of the
> function entirely and put a comment in devmap.c instead of the call to
> the (now empty) function.

I wasn't entirely sure if my reasoning is valid. In that case…

> -Toke

Sebastian