Re: [BUG] KFENCE: use-after-free read in udp_tunnel_nic_device_sync_work

From: Eric Dumazet

Date: Wed Jun 24 2026 - 09:59:50 EST


On Wed, Jun 24, 2026 at 6:42 AM Sam Sun <samsun1006219@xxxxxxxxx> wrote:
>
> On Wed, Jun 24, 2026 at 6:01 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > On Wed, Jun 24, 2026 at 2:01 AM Yue Sun <samsun1006219@xxxxxxxxx> wrote:
> > >
> > > Hello,
> > >
> > > I hit a reproducible use-after-free in the UDP tunnel NIC offload work item.
> > > The original local crash was reported by KFENCE as:
> > >
> > > KFENCE: use-after-free read in udp_tunnel_nic_device_sync_work
> > >
> > > On current mainline, the C reproducer below triggers the same lifetime bug,
> > > reported by KASAN before KFENCE samples the object:
> > >
> > > BUG: KASAN: slab-use-after-free in __mutex_lock
> > > Workqueue: udp_tunnel_nic udp_tunnel_nic_device_sync_work
> > >
> > > Tested kernel:
> > >
> > > 840ef6c78e6a ("Merge tag 'nfs-for-7.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs")
> > > Linux 7.1.0-11240-g840ef6c78e6a #31 SMP PREEMPT_DYNAMIC
> > >
> >
> >
> > Thanks or the report.
> >
> > Can you test the following patch?
> >
> > diff --git a/net/ipv4/udp_tunnel_nic.c b/net/ipv4/udp_tunnel_nic.c
> > index 9944ed923ddfd10f9adf6ad788c0740daeaf2adb..c5f8d2f9d325de8f4d2247ddaa52e33378851857
> > 100644
> > --- a/net/ipv4/udp_tunnel_nic.c
> > +++ b/net/ipv4/udp_tunnel_nic.c
> > @@ -304,8 +304,8 @@ udp_tunnel_nic_device_sync(struct net_device *dev,
> > struct udp_tunnel_nic *utn)
> > if (!utn->need_sync)
> > return;
> >
> > - queue_work(udp_tunnel_nic_workqueue, &utn->work);
> > utn->work_pending = 1;
> > + queue_work(udp_tunnel_nic_workqueue, &utn->work);
> > }
> >
> > static bool
> > @@ -866,6 +866,11 @@ udp_tunnel_nic_unregister(struct net_device *dev,
> > struct udp_tunnel_nic *utn)
> >
> > udp_tunnel_nic_lock(dev);
> >
> > + if (utn->work_pending) {
> > + udp_tunnel_nic_unlock(dev);
> > + return;
> > + }
> > +
> > /* For a shared table remove this dev from the list of sharing devices
> > * and if there are other devices just detach.
> > */
> > @@ -901,12 +906,6 @@ udp_tunnel_nic_unregister(struct net_device *dev,
> > struct udp_tunnel_nic *utn)
> > udp_tunnel_nic_flush(dev, utn);
> > udp_tunnel_nic_unlock(dev);
> >
> > - /* Wait for the work to be done using the state, netdev core will
> > - * retry unregister until we give up our reference on this device.
> > - */
> > - if (utn->work_pending)
> > - return;
> > -
> > udp_tunnel_nic_free(utn);
> > release_dev:
> > dev->udp_tunnel_nic = NULL;
>
> I tested the patch, but unfortunately the C reproducer still triggers the
> same use-after-free for me.
>
> Tested on top of:
>
> 840ef6c78e6a ("Merge tag 'nfs-for-7.2-1' of
> git://git.linux-nfs.org/projects/anna/linux-nfs")
>
> I booted the kernel with KASAN/KFENCE enabled and:
>
> panic_on_warn=1 panic_on_oops=1 kfence.sample_interval=1
>
> Then I ran the same C reproducer:
>
> timeout -k 10 360 /root/repro
>
> The VM panicked after about 236 seconds:
>
> [ 236.471119][ T58] BUG: KASAN: slab-use-after-free in
> __mutex_lock+0x16d0/0x1d80
> [ 236.473404][ T58] Read of size 8 at addr ff11000076a63ea8 by task
> kworker/u16:3/58
> [ 236.476455][ T58] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS 1.15.0-1 04/01/2014
> [ 236.476478][ T58] Workqueue: udp_tunnel_nic udp_tunnel_nic_device_sync_work
> [ 236.476787][ T58] __mutex_lock+0x16d0/0x1d80
> [ 236.477020][ T58] udp_tunnel_nic_device_sync_work+0x32/0x9c0
> [ 236.477068][ T58] process_one_work+0x9de/0x1bf0
>
> The allocation/free stacks are still the same shape:
> ```
> Allocated by task 11563:
> __kmalloc_noprof
> udp_tunnel_nic_netdevice_event+0x12d8/0x1e80
> register_netdevice
> nsim_create
> nsim_dev_reload_up
> devlink_reload
>
> Freed by task 11609:
> kfree
> udp_tunnel_nic_netdevice_event+0xc26/0x1e80
> unregister_netdevice_many_notify
> nsim_destroy
> nsim_dev_reload_down
> devlink_reload
>
> Last potentially related work creation:
> queue_work_on
> __udp_tunnel_nic_del_port+0x2af/0x320
> udp_tunnel_notify_del_rx_port
> __geneve_sock_release.part.0
> geneve_stop
>
> Second to last potentially related work creation:
> queue_work_on
> __udp_tunnel_nic_add_port+0x6ec/0xd70
> udp_tunnel_notify_add_rx_port
> geneve_open
> ```
>
> My read of the patch is that it closes the small window where queue_work()
> can publish the work before utn->work_pending is set, and it also prevents
> udp_tunnel_nic_unregister() from flushing/freeing the object when
> work_pending is already set.
>
> However, the test above suggests that work_pending still does not fully
> protect the lifetime of struct udp_tunnel_nic. The crashing work was still
> queued through udp_tunnel_nic_device_sync() at line 308, so the patched path
> was exercised. One suspicious point is that udp_tunnel_nic_device_sync_work()
> clears utn->work_pending at the beginning of the worker, while the same work
> item can still interact with replay/add/del-port state. The reproducer can
> still end up with udp_tunnel_nic_unregister() freeing utn while a
> udp_tunnel_nic_device_sync_work item later runs and dereferences the freed
> utn->lock.
>
> So this patch does not seem to be sufficient for this reproducer.
>

Oh well.

u8 need_sync:1;
u8 need_replay:1;
u8 work_pending:1;

These bitfields are not safe, obviously :/

Time to convert them to atomic bit operations.