Re: [PATCH] xfs: fix BUG_ON in xfs_getbmap()

From: yebin (H)
Date: Mon Mar 27 2023 - 21:34:06 EST




On 2023/3/27 23:15, Darrick J. Wong wrote:
[add Christoph to cc since he added/last touched this assert, I think]

On Mon, Mar 27, 2023 at 10:02:18PM +0800, Ye Bin wrote:
From: Ye Bin <yebin10@xxxxxxxxxx>

There's issue as follows:
XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329
Why not get rid of the assertion? It's not like it changes the course
of the code flow -- userspace still gets told there's a delalloc extent.
Thank you for your reply.
I think it's incorrect to return the delalloc extent to the user in this case. Because
users expect to obtain none delalloc extent information. If there is a delalloc extent
found at this time, there is a problem with the functionality. I even think that here
we should return an error to the userspace instead of return an incorrect result to
the userspace .
Or, if the assert does serve some purpose, then do we need to take
the mmaplock for cow fork reporting too?
Let me analyze whether it is necessary to take the mmaplock for cow fork reporting.
--D

------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422
RIP: 0010:assfail+0x96/0xa0
RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000
RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002
RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000
R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0
R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c
FS: 00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
xfs_getbmap+0x1a5b/0x1e40
xfs_ioc_getbmap+0x1fd/0x5b0
xfs_file_ioctl+0x2cb/0x1d50
__x64_sys_ioctl+0x197/0x210
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Above issue may happen as follows:
ThreadA ThreadB
do_shared_fault
__do_fault
xfs_filemap_fault
__xfs_filemap_fault
filemap_fault
xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag
xfs_getbmap
xfs_ilock(ip, XFS_IOLOCK_SHARED);
filemap_write_and_wait
do_page_mkwrite
xfs_filemap_page_mkwrite
__xfs_filemap_fault
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
iomap_page_mkwrite
...
xfs_buffered_write_iomap_begin
xfs_bmapi_reserve_delalloc -> Allocate delay extent
xfs_ilock_data_map_shared(ip)
xfs_getbmap_report_one
ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0)
-> trigger BUG_ON

As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's
small window mkwrite can produce delay extent after file write in xfs_getbmap().
To solve above issue, hold XFS_MMAPLOCK_EXCL lock when do xfs_getbmap(),
to prevent write operations by do_page_mkwrite().
During doing __xfs_filemap_fault() we can't hold IOLOCK lock, as it's may lead
to ABBA dealock with xfs_file_write_iter().It's very easy to reproduce when
do fsstress, lockdep will detect deadlock.

Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
---
fs/xfs/xfs_bmap_util.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index a09dd2606479..f23771a0cc8d 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -463,11 +463,13 @@ xfs_getbmap(
max_len = XFS_ISIZE(ip);
break;
case XFS_DATA_FORK:
+ lock = XFS_MMAPLOCK_EXCL;
+ xfs_ilock(ip, lock);
if (!(iflags & BMV_IF_DELALLOC) &&
(ip->i_delayed_blks || XFS_ISIZE(ip) > ip->i_disk_size)) {
error = filemap_write_and_wait(VFS_I(ip)->i_mapping);
if (error)
- goto out_unlock_iolock;
+ goto out_unlock_ilock;
/*
* Even after flushing the inode, there can still be
@@ -486,7 +488,7 @@ xfs_getbmap(
else
max_len = XFS_ISIZE(ip);
- lock = xfs_ilock_data_map_shared(ip);
+ lock |= xfs_ilock_data_map_shared(ip);
break;
}
--
2.31.1

.