Re: [PATCH 1/1] net: Add rtnl_lock for netif_device_attach/detach

From: Ben Hutchings
Date: Tue Apr 22 2014 - 13:27:09 EST


On Wed, 2014-04-16 at 15:08 +0800, Li, Zhen-Hua wrote:
> From: "Li, Zhen-Hua" <zhen-hual@xxxxxx>
>
> As netif_running is called in netif_device_attach/detach. There should be
> rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach
> and detach being called.
> I checked NIC some drivers, some of them have netif_device_attach/detach
> called between rtnl_lock/unlock, while some drivers do not.
>
> This patch is tring to find a generic way to fix this for all NIC drivers.

I don't think you can generically use the RTNL lock for this.

> Signed-off-by: Li, Zhen-Hua <zhen-hual@xxxxxx>
> ---
> net/core/dev.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5b3042e..795bbc5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any);
> */
> void netif_device_detach(struct net_device *dev)
> {
> + /**
> + * As netif_running is called , rtnl_lock and unlock are needed to
> + * avoid __LINK_STATE_START bit changes during this function call.
> + */
> + int need_unlock;
> +
> + need_unlock = rtnl_trylock();

It is never correct to use trylock and then continue even if it fails.
I think you're trying to simulate a reentrant mutex but this will fail
if *any* task already has the mutex.

Furthermore it is currently allowed and useful to call these functions
from atomic context (transmit or completion path) where it is not
possible to hold the mutex.

> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
> netif_running(dev)) {
> netif_tx_stop_all_queues(dev);
> }
> + if (need_unlock)
> + rtnl_unlock();
> }
> EXPORT_SYMBOL(netif_device_detach);

For netif_device_detach(), I wonder whether it is necessary to check
netif_running(). What are we trying to avoid?

> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach);
> */
> void netif_device_attach(struct net_device *dev)
> {
> + /**
> + * As netif_running is called , rtnl_lock and unlock are needed to
> + * avoid __LINK_STATE_START bit changes during this function call.
> + */
> + int need_unlock;
> +
> + need_unlock = rtnl_trylock();
> if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
> netif_running(dev)) {
> netif_tx_wake_all_queues(dev);
> __netdev_watchdog_up(dev);
> }
> + if (need_unlock)
> + rtnl_unlock();
> }
> EXPORT_SYMBOL(netif_device_attach);

I do see a problem if netif_device_detach() races with
dev_deactivate_many(), which is being mitigated but not avoided by the
test of netif_running().

I think a proper solution is going to involve changing
dev_deactivate_many() as well, removing the use of netif_running(), and
possible using cmpxchg() to atomically manipulate multiple bits of
dev->state.

Ben.

--
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein

Attachment: signature.asc
Description: This is a digitally signed message part