On 8/9/21 6:08 PM, Gang He wrote:
Hi Joseph and All,
The deadlock is caused by self-locking on one node.
There is three node cluster (mounted to /mnt/shared), the user run reflink command to clone the file to the same directory repeatedly,
e.g.
reflink "/mnt/shared/test" \
"/mnt/shared/.snapshots/test.`date +%m%d%H%M%S`.`hostname`"
After a while, the reflink process on each node is hung, the file system cannot be listed.
The problematic reflink command process is blocked by itself, e.g. the reflink process is hung at ghe-sle15sp2-nd2,
kernel: task:reflink state:D stack: 0 pid:16992 ppid: 4530
kernel: Call Trace:
kernel: __schedule+0x2fd/0x750
kernel: ? try_to_wake_up+0x17b/0x4e0
kernel: schedule+0x2f/0xa0
kernel: schedule_timeout+0x1cc/0x310
kernel: ? __wake_up_common+0x74/0x120
kernel: wait_for_completion+0xba/0x140
kernel: ? wake_up_q+0xa0/0xa0
kernel: __ocfs2_cluster_lock.isra.41+0x3b5/0x820 [ocfs2]
kernel: ? ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2]
kernel: ocfs2_inode_lock_full_nested+0x1fc/0x960 [ocfs2]
kernel: ocfs2_init_security_and_acl+0xbe/0x1d0 [ocfs2]
kernel: ocfs2_reflink+0x436/0x4c0 [ocfs2]
kernel: ? ocfs2_reflink_ioctl+0x2ca/0x360 [ocfs2]
kernel: ocfs2_reflink_ioctl+0x2ca/0x360 [ocfs2]
kernel: ocfs2_ioctl+0x25e/0x670 [ocfs2]
kernel: do_vfs_ioctl+0xa0/0x680
kernel: ksys_ioctl+0x70/0x80
In fact, the destination directory(.snapshots) inode dlm lock was acquired by ghe-sle15sp2-nd2, next there is bast message from other nodes to ask ghe-sle15sp2-nd2 downconvert lock, but the operation failed, the kernel message is printed like,
kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_downconvert_lock:3660 ERROR: DLM error -16 while calling ocfs2_dlm_lock on resource M0000000000000000046e0200000000
kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_unblock_lock:3904 ERROR: status = -16
kernel: (ocfs2dc-AA35DD9,2560,3):ocfs2_process_blocked_lock:4303 ERROR: status = -16
Then, the reflink process tries to acquire this directory inode dlm lock, the process is blocked, the dlm lock resource in memory looks like
l_name = "M0000000000000000046e0200000000",
l_ro_holders = 0,
l_ex_holders = 0,
l_level = 5 '\005',
l_requested = 0 '\000',
l_blocking = 5 '\005',
l_type = 0 '\000',
l_action = 0 '\000',
l_unlock_action = 0 '\000',
l_pending_gen = 645948,
So far, I do not know what makes dlm lock function failed, it also looks we do not handle this failure case in dlmglue layer, but I always reproduce this hang with my test script, e.g.
loop=1
while ((loop++)) ; do
for i in `seq 1 100`; do
reflink "/mnt/shared/test" "/mnt/shared/.snapshots /test.${loop}.${i}.`date +%m%d%H%M%S`.`hostname`"
done
usleep 500000
rm -f /mnt/shared/.snapshots/testnode1.qcow2.*.`hostname`
done
My patch changes multiple acquiring dest directory inode dlm lock during in ocfs2_reflink function, it avoids the hang issue happen again.The code change also can improve reflink performance in this case.
Thanks
Gang
'status = -16' implies DLM_CANCELGRANT.
Do you use stack user instead of o2cb? If yes, can you try o2cb with
your reproducer?
Thanks,
Joseph