Re: [PATCH v2] netpoll: Fix extra refcount release in netpoll_cleanup()

From: Neil Horman
Date: Mon Mar 28 2016 - 09:18:43 EST


On Fri, Mar 25, 2016 at 03:16:36PM -0400, David Miller wrote:
> From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
> Date: Fri, 25 Mar 2016 11:46:39 -0500
>
> > You're right, there is an issue here. I reproduced a problem with a
> > bond device. bond_netpoll_setup() calls __netpoll_setup() directly
> > (not netpoll_setup()). I'll debug it more; just wanted to let you
> > know there *is* a problem with this patch.
>
> I bet that's why the assignment to np->dev and the reference counting
> were separated in the first place :-/
>
> Indeed, commit 30fdd8a082a00126a6feec994e43e8dc12f5bccb:
>
> commit 30fdd8a082a00126a6feec994e43e8dc12f5bccb
> Author: Jiri Pirko <jiri@xxxxxxxxxxx>
> Date: Tue Jul 17 05:22:35 2012 +0000
>
> netpoll: move np->dev and np->dev_name init into __netpoll_setup()
>
> Signed-off-by: Jiri Pirko <jiri@xxxxxxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

We probably just want to balance the setting/clearing of np->dev in
__netpoll_setup, so that any error return (that would result in a drop of the
refcount in netpoll_setup) correlates to a setting of np->dev to NULL in
__netpoll_setup. That leaves us with the problem of having to watch for future
imbalances as you mentioned previously Dave, but it seems a potential problem
tomorrow is preferable to an actual problem today.

Another option would be to move the dev_hold/put into __netpoll_setup, but doing
so would I think require some additional refactoring in netpoll_setup. Namely
that we would still need a dev_hold/put in netpoll_setup to prevent the device
from being removed during the period where we release the rtnl lock in the if
(!netif_running(ndev)) clause. We would have to hold the device, unlock rtnl,
then put the device after re-aquiring rtnl at the end of that if block.

Neil