Re: [PATCH] ceph: add timeout protection to ceph_mdsc_sync() path
From: Alex Markuze
Date: Thu Feb 19 2026 - 04:38:17 EST
I tend to agree here with Slava, I don't support any Timeout as a
solution before we have an actual RCA.
On Wed, Feb 18, 2026 at 10:04 PM Viacheslav Dubeyko
<Slava.Dubeyko@xxxxxxx> wrote:
>
> On Wed, 2026-02-18 at 21:57 +0200, Ionut Nechita (Wind River) wrote:
> > Hi Slava,
> >
> > Thanks for testing and reproducing this with generic/013.
> >
> > Looking at the stack trace you shared:
> >
> > ceph_mdsc_sync+0x4b4 -> wait_for_completion(&req->r_safe_completion)
> > ceph_sync_fs
> > sync_filesystem
> > __x64_sys_syncfs
> >
> > This is the same pattern we see in the original report - the sync path
> > blocks indefinitely on wait_for_completion() with no timeout. In your
> > case it's ceph_mdsc_sync() hanging on r_safe_completion, which is
> > exactly what patch 2/3 ("ceph: add timeout protection to
> > ceph_mdsc_sync() path") addresses.
> >
> > The root cause may differ from the original IPv6/EADDRNOTAVAIL scenario,
> > but the symptom and the fix are the same - these wait_for_completion()
> > calls in the sync path need timeout protection regardless of what causes
> > the underlying delay.
> >
> > All three patches are now also on LKML:
> >
> > 1/3 - libceph: handle EADDRNOTAVAIL more gracefully (v2)
> > 2/3 - ceph: add timeout protection to ceph_mdsc_sync() path
> > 3/3 - ceph: add timeout protection to ceph_osdc_sync() path
> >
> > I've also added more details and debug information to the Ceph tracker
> > issue at https://urldefense.proofpoint.com/v2/url?u=https-3A__tracker.ceph.com_issues_74897&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=ty8gR4OjrwFXUQPjG9Dm4EapzO4Hwyj7aAX6INwulJY0GjoU0pTf7khYkXwrksDT&s=3fOX8o4od1TodhJw8SHCNjR4huXJS6VsRFYD6791DWM&e= - it might help with
> > your investigation.
>
> Frankly speaking, I don't see the situation of blocked thread if I am adding
> debug output. It looks like a race condition. And I am not sure now that adding
> timeout is the proper fix. Probably, we have some issue that needs to be fixed
> and timeout looks like workaround but not the fix. I don't think that I have the
> IPv6/EADDRNOTAVAIL case on my side.
>
> Thanks,
> Slava.