Re: [PATCH] xsk: switch xdp_build_skb_from_zc() to napi_alloc_skb()

From: Maciej Fijalkowski

Date: Mon May 18 2026 - 09:29:28 EST

On Mon, May 18, 2026 at 02:57:55PM +0200, Lorenz Brun wrote:
> On Wed, 13 May 2026 at 17:21, Alexander Lobakin
> <aleksander.lobakin@xxxxxxxxx> wrote:
> >
> > From: Lorenz Brun <lorenz@xxxxxxxxxxxx>
> > Date: Tue, 12 May 2026 17:26:56 +0200
> >
> > > xdp_build_skb_from_zc() allocated xdp->frame_sz bytes from the per-cpu
> > > system_page_pool and built the skb head with napi_build_skb(). The
> > > latter places skb_shared_info at the tail of the buffer, but the
> > > helper sized the allocation as if the whole frame_sz were usable for
> > > data. Whenever the packet plus reserved headroom approached frame_sz,
> > > the head memcpy overran shinfo with packet content, corrupting
> > > ->flags (SKBFL_ZEROCOPY_ENABLE) and ->nr_frags, which then drove
> > > skb_copy_ubufs() off the end of frags[] on the RX path:
> > >
> > > UBSAN: array-index-out-of-bounds in include/linux/skbuff.h:2541
> > > index 113 is out of range for type 'skb_frag_t [17]'
> > > skb_copy_ubufs+0x7da/0x960
> > > ip_local_deliver_finish+0xcd/0x110
> > > ice_napi_poll+0xe4/0x2a0 [ice]
> > >
> > > The overrun bytes come from the packet, so an on-wire sender can
> > > corrupt kernel memory remotely whenever the XDP program returns
> > > XDP_PASS.
> > >
> > > Rather than patch the sizing math, switch to the pattern used by other
> > > in-tree AF_XDP zero-copy drivers like mlx5 and i40e which use
> > > napi_alloc_skb() sized to the actual packet plus skb_put_data().
> > > This sizes the head exactly for the data being copied, drops the
> > > system_page_pool local_lock from this path, and removes the
> > > structural mismatch between frame_sz and the skb head buffer. Frags
> > > are allocated with alloc_page() per frag, matching the other drivers.
> >
> > I used napi_build_skb() + system page_pool to enable PP recycling
> > improving XSk XDP_PASS performance a lot.
> > Are you sure there's no other way to approach this?
> >
> > napi_alloc_skb() used in other drivers works, but it's sorta old
> > approach which is way slower.
> >
> > System page_pools always allocate a full page, why can it create an skb
> > prone to overruns?
> >
> > >
> > > Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
> > > Cc: stable@xxxxxxxxxxxxxxx
> > > Signed-off-by: Lorenz Brun <lorenz@xxxxxxxxxxxx>
> > Thanks,
> > Olek
>
> Hi Olek
>
> I looked at the code again. While your approach is indeed faster, it
> is only faster for traffic bypassing AF_XDP, which is generally not
> that relevant for performance.
>
> More critically, it currently corrupts kernel memory and panics the
> kernel very quickly when running with frame-size set to 2048, 1500
> MTU, and passing received packets. To be honest, I'm not familiar
> enough with the XSK subsystem to know exactly what specific sizing
> assumption was violated here. By comparison, the approach taken by the
> other drivers is a lot more obviously correct and works perfectly.
>
> If you want to preserve the current approach, I'm perfectly happy with
> that. However, I don't feel comfortable sending patches for it, as I
> don't understand exactly what the expectations of the various data
> blocks are.
>
> AFAIK, reproduction should be fairly easy. You just need to run a TCP
> connection to the receiving node (which gets passed to the kernel)
> while receiving some UDP packets via AF_XDP at the same time. As
> mentioned, it also needs frame-size 2048 to reproduce quickly.
>
> I checked if I could get you an easy reproducer, but xdp-tools is
> quite limited. If you want to keep your approach and can't reproduce
> the panic yourself, let me know and I can see if I can synthesize a
> minimal reproducer.

We now respect the tailroom in UMEM which is supposed to address shinfo
override cases. Could you re-test this on your side with cited patchset
being present on your tree?

https://lore.kernel.org/bpf/20260402154958.562179-1-maciej.fijalkowski@xxxxxxxxx/

>
> Regards,
> Lorenz