Re: [PATCH v3] workqueue: don't skip lockdep work dependency in cancel_work_sync()

From: Johannes Berg
Date: Wed Sep 13 2023 - 14:30:59 EST


On Wed, 2023-09-13 at 10:25 -0700, Florian Fainelli wrote:
>
> I would refrain from reverting without giving a fighting chance to the
> author to address it. It seems a bit strange that we do this locking
> dance while it might have been simpler to introduce a
> ftgmac100_reset_unlocked() and ftgmac100_reset() and call both at the
> intended places, something like the completely untested patch attached
> maybe?

Not sure that's right - it probably wants RTNL for some reason, but with
this patch you don't hold it when coming from ftgmac100_adjust_link()?
(If it did, it'd have deadlocked getting here after all, since it
acquires it)

Not sure why it needs RTNL though, that doesn't seem so bad, and holds
some internal locks. Or maybe it doesn't really, only if there's a
phydev and/or mii_bus, so maybe it needs a driver lock? Well, there's a
note about the reset task, that might be it?

static int ftgmac100_stop(struct net_device *netdev)
{
struct ftgmac100 *priv = netdev_priv(netdev);

/* Note about the reset task: We are called with the rtnl lock
* held, so we are synchronized against the core of the reset
* task. We must not try to synchronously cancel it otherwise
* we can deadlock. But since it will test for netif_running()
* which has already been cleared by the net core, we don't
* anything special to do.
*/



But it really kind of feels the locking model in this driver is a bit
off.

johannes