RCU nocb list not reclaiming causing OOM

From: David Chen
Date: Fri Jul 20 2018 - 19:22:46 EST


Hi Paul,

We hit an RCU issue on 4.9.37 kernel. One of the nocb_follower list grows too
large, and not getting reclaimed, causing the system to OOM.

Printing the culprit rcu_sched_data:

nocb_q_count = {
counter = 32369635
},
nocb_follower_head = 0xffff88ae901c0a00,
nocb_follower_tail = 0xffff88af1538b8d8,
nocb_kthread = 0xffff88b06d290000,

As you can see here, the nocb_follower_head is not empty, so in theory, the
nocb_kthread shouldn't go to sleep. However, if dump the stack of the kthread:

crash> bt 0xffff88b06d290000
PID: 21 TASK: ffff88b06d290000 CPU: 3 COMMAND: "rcuos/1"
#0 [ffffafc9020b7dc0] __schedule at ffffffff8d8789dc
#1 [ffffafc9020b7e38] schedule at ffffffff8d878e76
#2 [ffffafc9020b7e50] rcu_nocb_kthread at ffffffff8d112337
#3 [ffffafc9020b7ec8] kthread at ffffffff8d0c6ce7
#4 [ffffafc9020b7f50] ret_from_fork at ffffffff8d87d755

And if we dis the address at ffffffff8d112337:

/usr/src/debug/kernel-4.9.37/linux-4.9.37-29.nutanix.07142017.el7.centos.x86_64/kernel/rcu/tree_plugin.h: 2106
0xffffffff8d11232d <rcu_nocb_kthread+381>: test %rax,%rax
0xffffffff8d112330 <rcu_nocb_kthread+384>: jne 0xffffffff8d112355 <rcu_nocb_kthread+421>
0xffffffff8d112332 <rcu_nocb_kthread+386>: callq 0xffffffff8d878e40 <schedule>
0xffffffff8d112337 <rcu_nocb_kthread+391>: lea -0x40(%rbp),%rsi

So the kthread is blocked at swait_event_interruptible in the nocb_follower_wait.
This contradict with the fact that nocb_follower_head was not empty. So I
wonder if this is caused by the lack of memory barrier in the place shown below.
If the head is set to NULL after doing xchg, it will overwrite the head set
by leader. This caused the kthread to sleep the next iteration, and the leader
won't wake him up as the tail doesn't point to head.

Please tell me what do you think.

Thanks,
David

diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h linux-4.9.37/kernel/rcu/tree_plugin.h
--- linux-4.9.37.orig/kernel/rcu/tree_plugin.h 2017-07-12 06:42:41.000000000 -0700
+++ linux-4.9.37/kernel/rcu/tree_plugin.h 2018-07-20 15:25:57.311206343 -0700
@@ -2149,6 +2149,7 @@
BUG_ON(!list);
trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeNonEmpty");
WRITE_ONCE(rdp->nocb_follower_head, NULL);
+ smp_mb();
tail = xchg(&rdp->nocb_follower_tail, &rdp->nocb_follower_head);

/* Each pass through the following loop invokes a callback. */