Re: [PATCH] nilfs2: prevent double insertion of b_assoc_buffers in dirty buffer lookup

From: Ryusuke Konishi

Date: Tue Jun 30 2026 - 13:55:38 EST

Hi Wuyankun,

On Thu, Jun 25, 2026 at 7:43 PM Ryusuke Konishi wrote:
>
> On Thu, Jun 25, 2026 at 2:27 PM wuyankun wrote:
> >
> > syzbot reported list corruption caused by double list_add_tail() on
> > bh->b_assoc_buffers in nilfs_lookup_dirty_data_buffers().
> >
> > A buffer_head can still be dirty and not under async write while already
> > linked on another association list. Add list state checks before enqueueing
> > in both data and node dirty buffer scanners to avoid re-adding already
> > linked nodes.
> >
> > Reported-by: syzbot+c37bed40868932d790e9@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Link: https://syzkaller.appspot.com/bug?extid=c37bed40868932d790e9
> > Signed-off-by: wuyankun <wuyankun@xxxxxxxxxxxxx>
> > ---
> > fs/nilfs2/segment.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> > index 1491a4d4b1e1..09202d155903 100644
> > --- a/fs/nilfs2/segment.c
> > +++ b/fs/nilfs2/segment.c
> > @@ -741,6 +741,8 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode,
> > do {
> > if (!buffer_dirty(bh) || buffer_async_write(bh))
> > continue;
> > + if (!list_empty(&bh->b_assoc_buffers))
> > + continue;
> > get_bh(bh);
> > list_add_tail(&bh->b_assoc_buffers, listp);
> > ndirties++;
> > @@ -779,7 +781,8 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode,
> > bh = head = folio_buffers(fbatch.folios[i]);
> > do {
> > if (buffer_dirty(bh) &&
> > - !buffer_async_write(bh)) {
> > + !buffer_async_write(bh) &&
> > + list_empty(&bh->b_assoc_buffers)) {
> > get_bh(bh);
> > list_add_tail(&bh->b_assoc_buffers,
> > listp);
> > --
> > 2.20.1
> >
>
> Thank you for the patch.
>
> This patch adds a fix to forcibly avoid double registration of a
> buffer head to the list at the list manipulation level.
>
> However, we need to clarify why that happened in the first place, and
> what caused the state inconsistency that led to this double
> registration.
>
> The search routine for dirty buffers performs a gang lookup for dirty
> folios, registering them to the list in ascending order of their
> offsets in the page cache.
>
> Furthermore, the construction and destruction of this list take place
> inside the log writer, which operates exclusively. Whether the write
> succeeds or fails, the list is released by nilfs_release_buffers()
> (called via nilfs_destroy_logs()). Therefore, double registration
> normally does not occur, meaning there must be an oversight somewhere.
>
> As this issue arises from calls made via the GC ioctl
> (nilfs_ioctl_clean_segments()), the underlying implementation flaw
> likely lies there.
>
> Since it seems reproducible with the syzbot reproducer, I will also
> trace what is happening to verify if this fix direction is
> appropriate.

Regarding the cause of this issue, the page index of the folio
containing the problematic buffer head was 18446744073709551615 (=
ULONG_MAX on 64-bit architectures).
As a result, filemap_get_folios_tag(), called from
nilfs_lookup_dirty_data_buffers(), repeatedly finds the folio at index
ULONG_MAX, causing the duplicate processing of dirty buffers.

The root cause is that a page/folio with an index of ULONG_MAX has
been inserted into the page cache via the GC ioctl.

Specifically, nilfs_ioctl_move_blocks() processes the blocks to be
moved by GC based on the nilfs_vdesc structures passed as ioctl
arguments, reads them into the page cache via
nilfs_ioctl_move_inode_block(), and marks them dirty.

At this point, there is no range check for vdesc->vd_offset, which
determines the page index.
Consequently, the artificial request generated by syzbot causes a
folio to be inserted at the ULONG_MAX page index.

Therefore, the fix should not be to hide the issue by introducing a
duplicate check for the buffer list in
nilfs_lookup_dirty_data_buffers(). Instead, please define a local
variable in nilfs_ioctl_move_inode_block():

__u64 limit_offset = (__u64)inode->i_sb->s_maxbytes >> inode->i_blkbits;

and insert an error check just before calling
nilfs_gccache_submit_read_data() when vd_flags == 0, returning -EINVAL
if vd_offset is greater than or equal to this limit_offset.

This will resolve the syzbot issue, but a similar check will likely be
required for nilfs_gccache_submit_read_node(), which is called when
vd_flags != 0. If that part is not obvious, it might be better to
separate it.

I will hand this back to you for now, but I can also take over
creating the fix patch myself, so please let me know if you would like
me to do so.

Thanks,
Ryusuke Konishi