Re: mm: LTP/memcg testcase regression induced by 8cd7c588decf..66ce520bb7c2 series

From: Mike Galbraith
Date: Tue Nov 23 2021 - 06:06:26 EST


On Tue, 2021-11-23 at 09:13 +0000, Mel Gorman wrote:
> On Sun, Nov 21, 2021 at 11:57:20AM +0100, Mike Galbraith wrote:
> > Greetings,
> >
> > FYI, something in this series causes LTP controllers::memcg_regression
> > testcase to hang forever.  Verified via brute force revert of the lot.
> >
> > After letting box moan for 4.5 hours, I poked ^C repeatedly, but runltp
> > didn't exit/recover gracefully, and ps hung, so I nuked the box.  All
> > memcg_test_1 instances were stuck in reclaim_throttle().
> >
>
> I'll see can I reproduce this but do you know offhand what the test is
> doing and what the expected outcome is? A possibility is that this is a
> test that is driving the machine near OOM (or at least memcg OOM) and
> getting throttled instead of getting killed.

Here's the hanging test 4.

testcases/bin/memcg_regression_test.sh:
test_4()
{
./memcg_test_4.sh

check_kernel_bug
if [ $? -eq 1 ]; then
tst_resm TPASS "no kernel bug was found"
fi

# test_4.sh might be killed by oom, so do clean up here
killall -9 memcg_test_4 2> /dev/null
killall -9 memcg_test_4.sh 2> /dev/null

# if test_4.sh gets killed, it won't clean cgroup it created
rmdir memcg/0 2> /dev/null

swapon -a
}

testcases/bin/memcg_test_4.sh:
# attach current task to memcg/0/
mkdir memcg/0
echo $$ > memcg/0/tasks

./memcg_test_4 &
pid=$!
sleep 1

# let $pid allocate 100M memory
/bin/kill -SIGUSR1 $pid
sleep 1

# shrink memory, and then 80M will be swapped
echo 40M > memcg/0/memory.limit_in_bytes

# turn off swap, and swapoff will be killed
swapoff -a
sleep 1
echo $pid > memcg/tasks 2> /dev/null
echo $$ > memcg/tasks 2> /dev/null

# now remove the cgroup
rmdir memcg/0