Re: [PATCH net] net: dev: Detect dev_hold() after netdev_wait_allrefs()

From: Jann Horn
Date: Fri Jan 28 2022 - 12:58:37 EST


On Fri, Jan 28, 2022 at 3:25 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> On Thu, Jan 27, 2022 at 6:14 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> > > On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > > I've run into a bug where dev_hold() was being called after
> > > > netdev_wait_allrefs(). But at that point, the device is already going
> > > > away, and dev_hold() can't stop that anymore.
> > > >
> > > > To make such problems easier to diagnose in the future:
> > > >
> > > > - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether
> > > > the net refcount has been elevated. If this is detected, WARN() and
> > > > leak the object (to prevent worse consequences from a
> > > > use-after-free).
> > > > - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero.
> > > > This signals to the generic refcount infrastructure that any attempt
> > > > to increment the refcount later is a bug.
[...]
> > if (dev->reg_state == NETREG_UNREGISTERING) {
> > ASSERT_RTNL();
> > dev->needs_free_netdev = true;
> > return;
> > }
> >
> > /* Recheck in case someone called dev_hold() between
> > * netdev_wait_allrefs() and here.
> > */
> > if (WARN_ON(netdev_refcnt_read(dev) != 0))
> > return; /* leak memory, otherwise we might get UAF */
> >
> > netif_free_tx_queues(dev);
> > netif_free_rx_queues(dev);
>
> Maybe another solution would be to leverage the recent dev_hold_track().
>
> We could add a dead boolean to 'struct ref_tracker_dir ' (dev->refcnt_tracker)

Hmm... actually, what even are the semantics of dev_hold()?

Normal refcounts have the property that if you hold one reference,
you're always allowed to add another reference. But from what I can
tell, something like this:

struct net_device *dev = dev_get_by_name(net, name);
dev_hold(dev);
dev_put(dev);
dev_put(dev);

would be buggy using the current CONFIG_PCPU_DEV_REFCNT implementation.
Basically, if dev_hold() runs at the same time as
netdev_refcnt_read(), it's a bug because netdev_refcnt_read() is
non-atomic, and we could get the following race:

task B: starts netdev_refcnt_read()
task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 0)
task A, on CPU 0: dev_hold(dev) increments *per_cpu_ptr(dev->pcpu_refcnt, 0)
task A: migrates from CPU 0 to CPU 1
task A, on CPU 1: dev_put(dev) decrements *per_cpu_ptr(dev->pcpu_refcnt, 1)
task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 1)

which would make task B miss one outstanding reference.

(This is why the generic percpu refcounting code in
lib/percpu-refcount.c has logic for switching the refcount to atomic
mode with an RCU grace period.)


If these are the intended semantics for dev_hold(), then I guess your
approach of adding a new boolean flag somewhere is the right one - but
we should be setting that flag *before* waiting for the refcount to
drop to 1. Though maybe it shouldn't be in ref-tracker, since this is
a peculiarity of the hand-rolled netdev refcount...

Are these the intended semantics (and I should rewrite the patch to
also catch dev_hold() racing with netdev_wait_allrefs()), or is this
unintended (and the netdev refcount should be replaced)?

This should probably be documented...