Re: [PATCH 1/2] ceph: free page array when ceph_submit_write() fails

From: Ilya Dryomov

Date: Wed Feb 11 2026 - 09:53:21 EST

On Wed, Feb 11, 2026 at 3:10 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>
> On Mon, Jan 26, 2026 at 3:27 AM Sam Edwards <cfsworks@xxxxxxxxx> wrote:
> >
> > If `locked_pages` is zero, the page array must not be allocated:
> > ceph_process_folio_batch() uses `locked_pages` to decide when to
> > allocate `pages`, and redundant allocations trigger
> > ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
> > writeback stall) or even a kernel panic. Consequently, the main loop in
> > ceph_writepages_start() assumes that the lifetime of `pages` is confined
> > to a single iteration.
> >
> > The ceph_submit_write() function claims ownership of the page array on
> > success (it is later freed when the write concludes). But failures only
> > redirty/unlock the pages and fail to free the array, making the failure
> > case in ceph_submit_write() fatal.
> >
> > Free the page array (and reset locked_pages) in ceph_submit_write()'s
> > error-handling 'if' block so that the caller's invariant (that the array
> > does not remain in ceph_wbc) is maintained unconditionally, making
> > failures in ceph_submit_write() recoverable as originally intended.
> >
> > Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Sam Edwards <CFSworks@xxxxxxxxx>
> > ---
> > fs/ceph/addr.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index 63b75d214210..c3e0b5b429ea 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -1470,6 +1470,14 @@ int ceph_submit_write(struct address_space *mapping,
> > unlock_page(page);
> > }
> >
> > + if (ceph_wbc->from_pool) {
> > + mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
> > + ceph_wbc->from_pool = false;
> > + } else
> > + kfree(ceph_wbc->pages);
> > + ceph_wbc->pages = NULL;
> > + ceph_wbc->locked_pages = 0;
>
> Hi Sam,
>
> While I don't see anything wrong with the patch per se, I can't help
> but question the existence of this entire branch along with the meaning
> of the error.
>
> ceph_writepages_start() is the only caller of ceph_submit_write() and
> it already calls ceph_inc_osd_stopping_blocker() at the top where the
> error can be handled naturally -- nothing needs to be unlocked or freed
> at that point. Since mdsc->stopping_blockers is just a counter, all
> calls made by ceph_submit_write() invocations in a loop would be
> "contained" within that ceph_writepages_start() call. The only benefit
> achieved is potentially faster response to the MDS client moving to
> CEPH_MDSC_STOPPING_FLUSHING state, but it's rather dubious because
> sneaking in/having to wait for some more OSD requests isn't really the
> end of the world.
>
> Rather than patching the error path, I wonder if instead of calling
> ceph_inc_osd_stopping_blocker() the counter could just be incremented
> unconditionally, with the check for CEPH_MDSC_STOPPING_FLUSHING bypassed
> there? This could be wrapped into a new helper that could also assert
> that the counter is already elevated before the increment.

Something along the lines of

void __ceph_inc_osd_stopping_blocker(struct ceph_mds_client *mdsc)
{
BUG_ON(!atomic_inc_not_zero(&mdsc->stopping_blockers));
}

and switching to __ceph_inc_osd_stopping_blocker() in place of
ceph_inc_osd_stopping_blocker() in ceph_submit_write().

Thanks,

Ilya