Re: [PATCH] gfs2: fix hung task in gfs2_jhead_process_page
From: Deepanshu Kartikey
Date: Tue Mar 24 2026 - 20:07:18 EST
On Tue, Mar 24, 2026 at 8:16 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> I have my doubts that this is the right fix. If you look at the entire
> function, it assumes that the folio was already created and added to
> the page cache. The error should surely be detected earlier, not by
> this function.
>
Hi Mathew,
Thank you for the review. After further analysis, we found that the
bug is triggered by a malicious/corrupted GFS2 filesystem image
whose journal extent list contains extents with gaps between them:
extent 1: lblock=0, dblock=1000, blocks=2
extent 2: lblock=100, dblock=2000, blocks=2
gfs2_find_jhead() only grabs pages for blocks it visits:
extent 1: filemap_grab_folio(page 0) -> success
extent 2: filemap_grab_folio(page 12) -> success
Pages 1-11 are never grabbed. But the cleanup loop at out: advances
blocks_read sequentially through ALL page indices including the gaps:
while (blocks_read < block=101):
blocks_read=0 -> process page 0 ✓ grabbed
blocks_read=8 -> process page 1 ✗ NEVER grabbed
filemap_get_folio() -> ERR_PTR(-ENOENT)
folio_wait_locked(ERR_PTR) -> HANG FOREVER
We considered fixing this by tracking blocks_grabbed in
gfs2_find_jhead() and limiting the cleanup loop to only process
pages that were actually grabbed:
while (blocks_read < blocks_grabbed) { ... }
However this does not work because blocks_read advances
page-by-page sequentially. Even if we track the last successfully
grabbed block, the cleanup loop will still walk through page indices
that fall in the gaps between extents and were never grabbed.
I believe the correct fix is to handle ERR_PTR in
gfs2_jhead_process_page(), since that is the only place that can
distinguish between a page that was grabbed but not yet read versus
a page that was never grabbed at all due to gaps between extents.
But I may be missing something — could you suggest a better
approach?
Thank you for your guidance.
Deepanshu