dcache endless loop in d_invalidate

From: Martin Schwidefsky
Date: Tue Oct 16 2018 - 07:15:39 EST


Hi Al,

I am currently looking into a customer dump and found what looks like
an issue in the dcache code. And I think the following commit of yours
has something to do with it:

commit fe91522a7ba82ca1a51b07e19954b3825e4aaa22
Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date: Sat May 3 00:02:25 2014 -0400

don't remove from shrink list in select_collect()

If we find something already on a shrink list, just increment
data->found and do nothing else. Loops in shrink_dcache_parent() and
check_submounts_and_drop() will do the right thing - everything we
did put into our list will be evicted and if there had been nothing,
but data->found got non-zero, well, we have somebody else shrinking
those guys; just try again.

Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

The dump I got is based on kernel v4.4 but the affected dcache functions
look identical to the upstream version. Here is what I found in the dump:

A lot of "rcu_sched kthread starved for <xxx> jiffies!" messages
Only one CPU, currently running process "run-crons" task 0x65a8008
It just called check_and_drop from d_walk, full backchain:

PSW.addr check_and_drop at 30a0e8
%r14 d_walk at 308202
#0 [35b87b88] d_invalidate at 3096e8
#1 [35b87bd8] proc_flush_task at 37190c
#2 [35b87c58] release_task at 13f202
#3 [35b87cc8] wait_task_zombie at 13fc36
#4 [35b87d50] wait_consider_task at 140150
#5 [35b87dc0] do_wait at 1403de
#6 [35b87e18] sys_wait4 at 14181e
#7 [35b87ea8] system_call at 659ec4

Tasks runtime is
sum_exec_runtime 26813717162347 # nsec = 26813 seconds,
utime = 3991252 # cputime = 974 seconds,
stime = 99132516783832 # cputime = 24202 seconds,
Task 0x65a8008 has TIF_NEED_RESCHED set

d_walk() just called check_and_drop via the finish() function pointer,
check_and_drop() will return and d_walk() will return as well.
Look like an endless loop in d_invalidate().

The (struct dentry *) dentry in d_invalidate() is at 0x3cb15858
The struct detach_data data in d_invalidate() is at 0x35b87c28

dentry tree starting @ 0x3cb15858 has two entries in d_subdirs:
0x3cb15858 d_name.name: "11898"
0xb940d3d8 d_name.name: "cmdline"
0xb940dd98 d_name.name: "status"

crash> px *(struct dentry *) 0x3cb15858 | grep d_flags
d_flags = 0x2000cc,

crash> px *(struct dentry *) 0xb940d3d8 | grep d_flags
d_flags = 0x48048c, # DCACHE_SHRINK_LIST is set

crash> px *(struct dentry *) 0xb940dd98 | grep d_flags
d_flags = 0x48048c, # DCACHE_SHRINK_LIST is set

crash> px *(struct detach_data *) 0x35b87c28
$29 = {
select = {
start = 0x3cb15858,
dispose = {
next = 0x35b87c30,
prev = 0x35b87c30
},
found = 0x2
},
mountpoint = 0x0
}

select_collect() called from detach_and_collect() will increment
data.select.found in the struct detach_data @ 0x35b87c28 but will not
add any dentries to the dispose lists. The shrink_dentry_list() call in
d_invalidate() will do nothing as the dispose list is empty. The two
dentries 0xb940d3d8 and 0xb940dd98 are still there. After d_walk returns
d_invalidate() finds data.mountpoint == NULL and data.select.found == 2,
it will start the loop again without progress.

As this is a single CPU system without kernel preemption there is nobody
else that will do the shrinking of those dcache entries.

In short, this if-statement in select_collect:

if (dentry->d_flags & DCACHE_SHRINK_LIST) {
data->found++;
}

with assumption that "somebody else" will do the shrinking seems broken.

Do you agree?

--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.