Re: [PATCH v3 0/2] ceph_check_delayed_caps() softlockup
From: Jeff Layton
Date: Wed Aug 04 2021 - 11:52:51 EST
On Tue, 2021-07-06 at 14:52 +0100, Luis Henriques wrote:
> * changes since v3:
> - always round the delay with round_jiffies_relative() in function
> schedule_delayed() (patch 0001)
>
> This is an attempt to fix the softlock on the delayed_work workqueue. As
> stated in 0002 patch:
>
> Function ceph_check_delayed_caps() is called from the mdsc->delayed_work
> workqueue and it can be kept looping for quite some time if caps keep being
> added back to the mdsc->cap_delay_list. This may result in the watchdog
> tainting the kernel with the softlockup flag.
>
> v2 of this fix modifies the approach by time-bounding the loop in this
> function, so that any caps added to the list *after* the loop starts will
> be postponed to the next wq run.
>
> An extra change in 0001 (suggested by Jeff) allows scheduling runs for
> periods smaller than the default (5 secs) period. This way,
> delayed_work() can have the next run scheduled for the next list element
> ci->i_hold_caps_max instead of 5 secs.
>
> This patchset should fix the issue reported here [1], although a quick
> search for "ceph_check_delayed_caps" in the tracker returns a few more
> bugs, possibly duplicates.
>
> [1] https://tracker.ceph.com/issues/46284
>
> Luis Henriques (2):
> ceph: allow schedule_delayed() callers to set delay for workqueue
> ceph: reduce contention in ceph_check_delayed_caps()
>
> fs/ceph/caps.c | 17 ++++++++++++++++-
> fs/ceph/mds_client.c | 25 ++++++++++++++++---------
> fs/ceph/super.h | 2 +-
> 3 files changed, 33 insertions(+), 11 deletions(-)
>
FWIW, we've had some more reports of this, so I think we should get this
into mainline and stable soon. I'm going to squash these two patches
together as it should (hopefully) make it simpler for stable backports.
Thanks,
--
Jeff Layton <jlayton@xxxxxxxxxx>