Re: [PATCH v3] workqueue: don't skip lockdep work dependency in cancel_work_sync()

From: Guenter Roeck
Date: Wed Sep 13 2023 - 13:45:23 EST

Next message: Chuck Lever III: "Re: [linus:master] [shmem] a2e459555c: aim9.disk_src.ops_per_sec -19.0% regression"
Previous message: Brett Creeley: "[PATCH vfio] pds/vfio: Fix possible sleep while in atomic context"
In reply to: Florian Fainelli: "Re: [PATCH v3] workqueue: don't skip lockdep work dependency in cancel_work_sync()"
Next in thread: Johannes Berg: "Re: [PATCH v3] workqueue: don't skip lockdep work dependency in cancel_work_sync()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 9/13/23 10:25, Florian Fainelli wrote:

On 9/13/23 08:59, Guenter Roeck wrote:

On 9/13/23 07:41, Johannes Berg wrote:

Hi Guenter,

This patch results in the attached lockdep splat when running the
ast2600-evb emulation in qemu with aspeed_g5_defconfig and lock debugging
enabled. Reverting this patch fixes the problem.

Umm ... That's only true if you think the problem is the lockdep splat,
rather than the actual potential deadlock?!

It was hard for me to say because the workqueue lock doesn't exist
in the first place if lockdep debugging is not enabled.

[    9.809960] ======================================================
[    9.810053] WARNING: possible circular locking dependency detected
[    9.810196] 6.6.0-rc1-00004-g6faca50f629f #1 Tainted: G                 N

I don't have this exact tree, but on 6.6-rc1,

Meh, I just included a couple of bug fixes not yet available in 6.6-rc1.

[    9.810327] ------------------------------------------------------
[    9.810406] ip/357 is trying to acquire lock:
[    9.810501] 83af6c40 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: __flush_work+0x40/0x550
[    9.811052]
[    9.811052] but task is already holding lock:
[    9.811133] 81639924 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x124/0x514
[    9.811264]
[    9.811264] which lock already depends on the new lock.
[    9.811264]
[    9.811361]
[    9.811361] the existing dependency chain (in reverse order) is:
[    9.811466]
[    9.811466] -> #1 (rtnl_mutex){+.+.}-{3:3}:
[    9.811616]        lock_acquire+0xfc/0x368
[    9.811717]        __mutex_lock+0x90/0xf00
[    9.811782]        mutex_lock_nested+0x24/0x2c
[    9.811845]        ftgmac100_reset+0x1c/0x1dc

This does indeed take the RTNL:

static void ftgmac100_reset(struct ftgmac100 *priv)
{
         struct net_device *netdev = priv->netdev;
         int err;

         netdev_dbg(netdev, "Resetting NIC...\n");

         /* Lock the world */
         rtnl_lock();

and is called from

[    9.811907]        ftgmac100_adjust_link+0xc0/0x13c
[    9.811972]        phy_link_change+0x30/0x5c
[    9.812035]        phy_check_link_status+0x9c/0x11c
[    9.812100]        phy_state_machine+0x1c0/0x2c0

this work (phy_state_machine is the function), which

[    9.812405] -> #0 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}:
[    9.812531]        check_prev_add+0x128/0x15ec
[    9.812594]        __lock_acquire+0x16ec/0x20cc
[    9.812656]        lock_acquire+0xfc/0x368
[    9.812712]        __flush_work+0x70/0x550
[    9.812769]        __cancel_work_timer+0x1e4/0x264
[    9.812833]        phy_stop+0x78/0x128

is cancelled by phy_stop() in phy_stop_machine():

void phy_stop_machine(struct phy_device *phydev)
{
         cancel_delayed_work_sync(&phydev->state_queue);

but of course that's called by the driver under RTNL:

[    9.812889]        ftgmac100_stop+0x5c/0xac
[    9.812949]        __dev_close_many+0xb8/0x140

(__dev_close_many requires RTNL)

So you have a potential deadlock in this driver. Yes, workqueue items
and RTNL are basically incompatible. Don't do that. Now this bug was
_probably_ added by commit 1baf2e50e48f ("drivers/net/ftgmac100: fix
DHCP potential failure with systemd") which added a call to
ftgmac100_reset() in ftgmac100_adjust_link() which is the thing called
from the PHY state machine in the first place.

Should that be reverted? I don't know ... maybe it can be fixed
differently.

But anyway ... as far as lockdep/workqueue stuff is concerned I'd
definitely call it a win rather than a bug! Yay for making lockdep
useful - it found a deadlock situation for you! :-) No need to blame
lockdep for that :P

So you are saying that anything running in a workqueue must not
acquire rtnl_lock because cancel_[delayed_]work_sync() may be called
under rtnl_lock.

Fair point, but is that documented somewhere ? If not, how is anyone
supposed to know ? If it is not documented, I might we well argue that
cancel_[delayed_]work_sync() should not be called with rtnl_lock held
because some worker might hold that lock.

FWIW, it would be nice if the lockdep code would generate some other
message in this situation. Complaining about a deadlock involving a
lock that doesn't exist if lock debugging isn't enabled is not really
helpful and, yes, may result in reporters to falsely assume that this
lock is responsible for the potential deadlock.

Reverting 1baf2e50e48f does fix the problem as well.

I would refrain from reverting without giving a fighting chance to the author to address it. It seems a bit strange that we do this locking dance while it might have been simpler to introduce a ftgmac100_reset_unlocked() and ftgmac100_reset() and call both at the intended places, something like the completely untested patch attached maybe?

I agree. As it turns out, there are lots of "workqueue items" in the
kernel calling rtnl_lock(), and at least some of them are canceled
with cancel_delayed_work_sync(). So there has to be some additional
qualifying factor, such as that it is safe to acquire rtnl_lock() in a
worker as long as its cancel function is not called with the lock held.

I don't know if the attached patch would work because I don't know the
impact of executing ftgmac100_reset() without holding the locks.
Also, after all, the problem isnt't the lock itself, but the fact that
cancel_delayed_work_sync() may be called on this specific worker with
rtnl_lock held.

Guenter

Next message: Chuck Lever III: "Re: [linus:master] [shmem] a2e459555c: aim9.disk_src.ops_per_sec -19.0% regression"
Previous message: Brett Creeley: "[PATCH vfio] pds/vfio: Fix possible sleep while in atomic context"
In reply to: Florian Fainelli: "Re: [PATCH v3] workqueue: don't skip lockdep work dependency in cancel_work_sync()"
Next in thread: Johannes Berg: "Re: [PATCH v3] workqueue: don't skip lockdep work dependency in cancel_work_sync()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]