Re: mmotm 2016-08-02-15-53 uploaded

From: Vitaly Mayatskikh
Date: Tue Oct 10 2017 - 22:02:04 EST


On Tue, 10 Oct 2017 20:48:34 -0400,
Changwei Ge wrote:
>
> Hi Andrew and Vitaly,
>
> I do agree that patch ee8f7fcbe638 ("ocfs2/dlm: continue to purge
> recovery lockres when recovery master goes down", 2016-08-02) introduced
> an issue. It makes DLM recovery can't pick up a new master for an
> existed lock resource whose owner died seconds ago.
>
> But this patch truly solves another issue.
> So I think we can't just revert this patch but to give a fix to it.

I'm not saying we must revert the patch, I just stating it broke one of use cases :)

Here's a tad more diagnosis:

System is x86-64 Fedora 25 (I think) with 4.14-rc4.

Cluster config:

# cat /etc/ocfs2/cluster.conf
cluster:
node_count = 2
name = cyccfs

node:
number = 1
cluster = cyccfs
ip_port = 7777
ip_address = 192.168.122.93
name = node-A

node:
number = 2
cluster = cyccfs
ip_port = 7777
ip_address = 192.168.122.96
name = node-B

Test case (run on both nodes):

# cat ocfs
#!/bin/bash

modprobe ocfs2_stack_o2cb
modprobe ocfs2_dlmfs
SYSTEMCTL_SKIP_REDIRECT=1 service o2cb start
SYSTEMCTL_SKIP_REDIRECT=1 service ocfs2 start
mount /dev/sdd /ocfs

touch /ocfs/file
while true; do
dd if=/dev/urandom of=/ocfs/file bs=1 count=100 conv=notrunc oflag=append
dd if=/ocfs/file of=/dev/null bs=1
done


Then send hard reboot to either node - in this case it's A.

Please ignore inconsistent dmesg timestamps - cut&paste from multiple runs.

Node B runs, notices A passed away, loops with error -107 until A reappears, then currently used lock becomes unusable (file behind the lock too):

[ 68.875788] ocfs2: Registered cluster interface o2cb
[ 68.911161] OCFS2 User DLM kernel interface loaded
[ 69.053471] o2hb: Heartbeat mode set to local
[ 73.316460] o2net: Connected to node node-A (num 1) at 192.168.122.93:7777
[ 77.750103] o2dlm: Joining domain 753F2064B4124242A46F1715538F8618
[ 77.750103] (
[ 77.753092] 1
[ 77.754297] 2
[ 77.755553] ) 2 nodes
[ 77.811583] ocfs2: Mounting device (8,48) on (node 2, slot 1) with ordered data mode.
[ 140.768968] o2net: Connection to node node-A (num 1) at 192.168.122.93:7777 shutdown, state 7
[ 140.769968] o2net: No longer connected to node node-A (num 1) at 192.168.122.93:7777
[ 140.777303] (dd,2323,1):dlm_send_remote_convert_request:420 ERROR: Error -112 when sending message 504 (key 0x5339c6e3) to node 1
[ 140.782710] o2dlm: Waiting on the death of node 1 in domain 753F2064B4124242A46F1715538F8618
[ 145.991182] (dd,2323,1):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 145.997284] o2dlm: Waiting on the death of node 1 in domain 753F2064B4124242A46F1715538F8618
[ 151.111187] (dd,2323,1):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 151.120235] o2dlm: Waiting on the death of node 1 in domain 753F2064B4124242A46F1715538F8618
[ 156.231307] (dd,2323,1):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 156.237260] o2dlm: Waiting on the death of node 1 in domain 753F2064B4124242A46F1715538F8618
[ 161.351199] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 161.362485] o2dlm: Waiting on the death of node 1 in domain 753F2064B4124242A46F1715538F8618
[ 166.983245] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 166.989088] o2dlm: Waiting on the death of node 1 in domain 753F2064B4124242A46F1715538F8618
[ 170.975235] o2net: No connection established with node 1 after 30.0 seconds, check network and cluster configuration.
[ 170.981640] o2net: Connection to node node-A (num 1) at 192.168.122.93:7777 shutdown, state 7
[ 173.025099] o2net: Connection to node node-A (num 1) at 192.168.122.93:7777 shutdown, state 7
[ 175.073443] o2net: Connection to node node-A (num 1) at 192.168.122.93:7777 shutdown, state 7
[ 175.713830] o2cb: o2dlm has evicted node 1 from domain 753F2064B4124242A46F1715538F8618
[ 175.713851] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 175.713851] o2dlm: Waiting on the death of node 1 in domain 753F2064B4124242A46F1715538F8618
[ 175.847251] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
...
[ 176.743289] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 176.799237] o2dlm: Waiting on the recovery of node 1 in domain 753F2064B4124242A46F1715538F8618
[ 176.855271] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
...
[ 180.023231] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 180.135277] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 180.191321] o2dlm: Begin recovery on domain 753F2064B4124242A46F1715538F8618 for node 1
[ 180.197953] o2dlm: Node 2 (me) is the Recovery Master for the dead node 1 in domain 753F2064B4124242A46F1715538F8618
[ 180.202519] o2dlm: End recovery on domain 753F2064B4124242A46F1715538F8618
[ 180.247333] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 180.367312] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
...
[ 187.815221] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 187.874086] ocfs2: Begin replay journal (node 1, slot 0) on device (8,48)
[ 187.883851] ocfs2: End replay journal (node 1, slot 0) on device (8,48)
[ 187.910708] ocfs2: Beginning quota recovery on device (8,48) for slot 0
[ 187.927200] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 187.938099] ocfs2: Finishing quota recovery on device (8,48) for slot 0
[ 188.039262] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
...
[ 263.535262] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 263.647244] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 263.759228] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x5339c6e3) to node 1
[ 263.780479] o2net: Connection to node node-A (num 1) at 192.168.122.93:7777 shutdown, state 7
[ 265.826450] o2net: Connected to node node-A (num 1) at 192.168.122.93:7777
[ 265.832021] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -92 when sending message 504 (key 0x5339c6e3) to node 1
...
[ 269.121285] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -92 when sending message 504 (key 0x5339c6e3) to node 1
[ 269.233402] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -92 when sending message 504 (key 0x5339c6e3) to node 1
[ 269.345747] (dd,2323,0):dlm_send_remote_convert_request:420 ERROR: Error -92 when sending message 504 (key 0x5339c6e3) to node 1
[ 269.384183] o2dlm: Node 1 joins domain 753F2064B4124242A46F1715538F8618
[ 269.384183] (
[ 269.387089] 1
[ 269.388762] 2
[ 269.390137] ) 2 nodes
[ 269.459165] (dd,2323,0):dlm_send_remote_convert_request:416 ERROR: dlm status = DLM_IVLOCKID
[ 269.465496] (dd,2323,0):dlmconvert_remote:343 ERROR: dlm status = DLM_IVLOCKID
[ 269.470268] (dd,2323,0):__ocfs2_cluster_lock:1549 ERROR: DLM error -22 while calling ocfs2_dlm_lock on resource W000000000000000000f0050de1f090
[ 269.476655] (dd,2323,0):ocfs2_rw_lock:1740 ERROR: status = -22
[ 269.479963] (dd,2323,0):ocfs2_file_write_iter:2244 ERROR: status = -22



node A says it does not know such lock:

[ 308.107780] (kworker/u8:7,101,0):dlm_convert_lock_handler:513 ERROR: did not find lock to convert on grant queue! cookie=2:9
[ 308.116505] lockres: W000000000000000000f0050de1f090, owner=1, state=0
[ 308.121113] last used: 0, refcnt: 3, on purge list: no
[ 308.123443] on dirty list: no, on reco list: no, migrating pending: no
[ 308.129062] inflight locks: 0, asts reserved: 0
[ 308.131417] refmap nodes: [
[ 308.131417] ], inflight=0
[ 308.139445] granted queue:
[ 308.141671] type=5, conv=-1, node=1, cookie=1:10, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), pending=(conv=n,lock=n,cancel=n,unlock=n)
[ 308.149155] converting queue:
[ 308.150387] blocked queue:

Trying to mount -o remount on B, it dies:

[ 166.940431] (mount.ocfs2,2546,3):ocfs2_unlock_ast:1181 ERROR: Dlm passes error -22 for lock N0000000000000009, unlock_action 2
[ 166.948393] (mount.ocfs2,2546,3):dlmunlock:688 ERROR: dlm status = DLM_IVLOCKID
[ 166.952468] (mount.ocfs2,2546,3):ocfs2_drop_lock:3237 ERROR: DLM error -22 while calling ocfs2_dlm_unlock on resource N00000000000000090000f005
[ 166.959220] (mount.ocfs2,2546,3):ocfs2_drop_lock:3238 ERROR: lockres flags: 194
[ 166.963247] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[ 166.964247] IP: _raw_spin_lock+0xc/0x30
[ 166.964247] PGD 0 P4D 0
[ 166.964247] Oops: 0002 [#1] SMP
[ 166.964247] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ghash_clmulni_intel i2c_piix4 crc32c_intel serio_raw 8139cp
[ 166.964247] CPU: 3 PID: 2546 Comm: mount.ocfs2 Not tainted 4.14.0-rc4+ #57
[ 166.964247] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 166.964247] task: ffff880136073fc0 task.stack: ffffc90000c74000
[ 166.964247] RIP: 0010:_raw_spin_lock+0xc/0x30
[ 166.964247] RSP: 0018:ffffc90000c77c80 EFLAGS: 00010246
[ 166.964247] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 166.964247] RDX: 0000000000000001 RSI: ffff88013fd8cbd8 RDI: 00000000000000b0
[ 166.964247] RBP: ffffc90000c77c90 R08: 0000000000000000 R09: 000000000000058d
[ 166.964247] R10: ffffc90000c77cc0 R11: 0000000081ee5501 R12: ffffc90000c77cd8
[ 166.964247] R13: ffff880136c9e660 R14: ffff880136c9e618 R15: 0000000000000246
[ 166.964247] FS: 00007f2d8a3e3dc0(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
[ 166.964247] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 166.964247] CR2: 00000000000000b0 CR3: 0000000137a7e000 CR4: 00000000000006e0
[ 166.964247] Call Trace:
[ 166.964247] ? dlm_print_one_lock+0x1a/0x30 [ocfs2_dlm]
[ 166.964247] o2cb_dump_lksb+0x12/0x20 [ocfs2_stack_o2cb]
[ 166.964247] ocfs2_dlm_dump_lksb+0x17/0x20 [ocfs2_stackglue]
[ 166.964247] ocfs2_drop_lock.isra.33.part.34+0x319/0x360 [ocfs2]
[ 166.964247] ocfs2_simple_drop_lockres+0x2d/0x40 [ocfs2]
[ 166.964247] ocfs2_dentry_lock_put+0x50/0x70 [ocfs2]
[ 166.964247] ocfs2_dentry_iput+0x3a/0xf0 [ocfs2]
[ 166.964247] dentry_unlink_inode+0xa7/0x150
[ 166.964247] __dentry_kill+0xc1/0x150
[ 166.964247] shrink_dentry_list+0x117/0x280
[ 166.964247] shrink_dcache_sb+0xc5/0x140
[ 166.964247] do_remount_sb+0x4d/0x1e0
[ 166.964247] do_mount+0x74c/0xcb0
[ 166.964247] ? copy_mount_options+0x2c/0x220
[ 166.964247] SyS_mount+0x83/0xd0
[ 166.964247] entry_SYSCALL_64_fastpath+0x13/0x94
[ 166.964247] RIP: 0033:0x7f2d89b0e6fa
[ 166.964247] RSP: 002b:00007ffe70f9ea38 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 166.964247] RAX: ffffffffffffffda RBX: 00007ffe70f9da11 RCX: 00007f2d89b0e6fa
[ 166.964247] RDX: 000055dd5b4d06ae RSI: 000055dd5b85e120 RDI: 000055dd5b85e3d0
[ 166.964247] RBP: 0000000000000000 R08: 000055dd5b85e810 R09: 0000000000000001
[ 166.964247] R10: 0000000000200020 R11: 0000000000000202 R12: 0000000000001001
[ 166.964247] R13: 000055dd5b85e101 R14: 00007ffe70f9da10 R15: 0000000000000000
[ 166.964247] Code: 7d 08 e8 88 4f a0 ff 31 c0 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 dd b8 01 00 00 00 c3 90 66 66 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 50 7d a4 ff 5d
[ 166.964247] RIP: _raw_spin_lock+0xc/0x30 RSP: ffffc90000c77c80
[ 166.964247] CR2: 00000000000000b0


Few minutes later A dies as well:

[ 83.229138] (kworker/u8:8,102,0):dlm_unlock_lock_handler:514 ERROR: failed to find lock to unlock! cookie=1:9
[ 113.631250] o2net: Connection to node node-A (num 1) at 192.168.122.93:7777 has been idle for 30.403 secs.
[ 143.839155] o2net: Connection to node node-A (num 1) at 192.168.122.93:7777 has been idle for 30.193 secs.
[ 146.466137] o2net: No longer connected to node node-A (num 1) at 192.168.122.93:7777
[ 146.473923] o2cb: o2dlm has evicted node 1 from domain 753F2064B4124242A46F1715538F8618
[ 147.935147] o2dlm: Begin recovery on domain 753F2064B4124242A46F1715538F8618 for node 1
[ 147.941285] o2dlm: Node 2 (me) is the Recovery Master for the dead node 1 in domain 753F2064B4124242A46F1715538F8618
[ 147.947347] o2dlm: End recovery on domain 753F2064B4124242A46F1715538F8618
[ 158.592082] ------------[ cut here ]------------
[ 158.594722] kernel BUG at fs/ocfs2/reservations.c:507!
[ 158.597507] invalid opcode: 0000 [#1] SMP
[ 158.598507] Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ghash_clmulni_intel i2c_piix4 crc32c_intel serio_raw 8139cp
[ 158.598507] CPU: 2 PID: 2485 Comm: dd Not tainted 4.14.0-rc4+ #57
[ 158.598507] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 158.598507] task: ffff880136f03300 task.stack: ffffc9000116c000
[ 158.598507] RIP: 0010:__ocfs2_resv_find_window+0x3f9/0x440 [ocfs2]
[ 158.598507] RSP: 0018:ffffc9000116f720 EFLAGS: 00010246
[ 158.598507] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
[ 158.598507] RDX: 0000000000000000 RSI: ffffc9000116f748 RDI: 0000000000000000
[ 158.598507] RBP: ffffc9000116f778 R08: 0000000000000000 R09: ffffc9000116f744
[ 158.598507] R10: 0000000000000001 R11: ffff8801318a84c0 R12: 0000000000000000
[ 158.598507] R13: 0000000000000000 R14: 0000000000000000 R15: ffff88012da04400
[ 158.598507] FS: 00007f869b3d8700(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
[ 158.598507] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 158.598507] CR2: 0000563858481000 CR3: 0000000135f0a000 CR4: 00000000000006e0
[ 158.598507] Call Trace:
[ 158.598507] ocfs2_resmap_resv_bits+0xad/0x360 [ocfs2]
[ 158.598507] ocfs2_claim_local_alloc_bits+0xbe/0x490 [ocfs2]
[ 158.598507] __ocfs2_claim_clusters+0x136/0x250 [ocfs2]
[ 158.598507] ocfs2_add_clusters_in_btree+0xad/0x430 [ocfs2]
[ 158.598507] ? ocfs2_inode_cache_lock+0x12/0x20 [ocfs2]
[ 158.598507] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2]
[ 158.598507] ? get_page_from_freelist+0x3ef/0xb90
[ 158.598507] ocfs2_add_inode_data+0x4b/0x60 [ocfs2]
[ 158.598507] ? ocfs2_allocate_extend_trans+0x160/0x160 [ocfs2]
[ 158.598507] ocfs2_write_cluster.isra.35+0x46f/0x760 [ocfs2]
[ 158.598507] ? __add_to_page_cache_locked+0x142/0x1d0
[ 158.598507] ? lru_cache_add+0x3a/0x80
[ 158.598507] ? add_to_page_cache_lru+0x6a/0xd0
[ 158.598507] ocfs2_write_begin_nolock+0x1381/0x17b0 [ocfs2]
[ 158.598507] ? ocfs2_inode_lock_full_nested+0x362/0x880 [ocfs2]
[ 158.598507] ? ocfs2_allocate_extend_trans+0x160/0x160 [ocfs2]
[ 158.598507] ocfs2_write_begin+0xd5/0x1b0 [ocfs2]
[ 158.598507] ? security_inode_need_killpriv+0x33/0x50
[ 158.598507] generic_perform_write+0xaa/0x180
[ 158.598507] __generic_file_write_iter+0x14f/0x1c0
[ 158.598507] ? ocfs2_inode_unlock+0xf4/0x110 [ocfs2]
[ 158.598507] ocfs2_file_write_iter+0x3dd/0xa40 [ocfs2]
[ 158.598507] ? memzero_explicit+0x12/0x20
[ 158.598507] __vfs_write+0xd2/0x130
[ 158.598507] vfs_write+0xb5/0x1a0
[ 158.598507] SyS_write+0x46/0xa0
[ 158.598507] entry_SYSCALL_64_fastpath+0x13/0x94
[ 158.598507] RIP: 0033:0x7f869aef8c30
[ 158.598507] RSP: 002b:00007ffc659ea698 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 158.752095] RAX: ffffffffffffffda RBX: 00007f869b1c1b38 RCX: 00007f869aef8c30
[ 158.752095] RDX: 0000000000000001 RSI: 0000563858481000 RDI: 0000000000000001
[ 158.752095] RBP: 00007f869b1c1ae0 R08: 001167e91233fc63 R09: 00007ffc659f3080
[ 158.752095] R10: 0000563858480030 R11: 0000000000000246 R12: 00007f869b1c1b38
[ 158.752095] R13: 0000000000002010 R14: 7fffffffffffffff R15: 0000000000002710
[ 158.770296] Code: ff ff 45 8b 75 18 41 8b 55 1c e9 e1 fc ff ff 45 8b 75 18 41 8b 55 1c c7 45 b0 00 00 00 00 e9 cd fc ff ff 85 c0 0f 85 50 ff ff ff <0f> 0b 48 b8 00 00 00 00 00 00 00 10 48 8d 7d d0 45 89 c1 48 89
[ 158.770296] RIP: __ocfs2_resv_find_window+0x3f9/0x440 [ocfs2] RSP: ffffc9000116f720

All above does not happen with the mentioned commit reverted.

Thanks!
--
wbr, Vitaly