Power9 NV linux-next random process hang

From: Qian Cai
Date: Tue Jan 05 2021 - 17:37:47 EST


.config: https://cailca.coding.net/public/linux/mm/git/files/master/powerpc.config

Today's linux-next starts to generate random process hang quite easily.
Yesterday's build seems work fine. Sometimes, the process stack seems corrupt
while the process is running 100% CPU with gdb shows it just entered a
subroutine that really can't see why it hangs.

[ 6732.309621][T11627] task:ranbug state:R running task stack:24176 pid: 2893 ppid: 2867 flags:0x00040000
[ 6732.309779][T11627] Call Trace:
[ 6732.309826][T11627] [c00000006166fa30] [c00000006166fb60] 0xc00000006166fb60 (unreliable)

Also, running LTP syscalls ended up hanging with lots of zombie process. Any idea?

root 2023 0.0 0.0 0 0 ? Zs 14:10 0:00 [login] <defunct>
root 52052 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [recv01] <defunct>
root 52054 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [recvfrom01] <defunct>
root 52056 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [recvmsg01] <defunct>
root 52155 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [rt_sigtimedwait] <defunct>
root 52305 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [semctl01] <defunct>
root 52362 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [send01] <defunct>
root 52386 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04] <defunct>
root 52387 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04] <defunct>
root 52388 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04] <defunct>
root 52389 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04] <defunct>
root 52390 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04] <defunct>
root 52392 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04_64] <defunct>
root 52393 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04_64] <defunct>
root 52394 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04_64] <defunct>
root 52395 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04_64] <defunct>
root 52396 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile04_64] <defunct>
root 52398 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile05] <defunct>
root 52400 0.0 0.0 0 0 pts/0 Z 15:03 0:00 [sendfile05_64] <defunct>
root 52415 0.0 0.0 0 0 pts/0 Z 15:04 0:00 [sendmsg01] <defunct>
root 53470 0.0 0.0 0 0 pts/0 Z 15:04 0:00 [sendto01] <defunct>
root 53763 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53764 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53765 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53766 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53767 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53768 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53769 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53770 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53771 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53772 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53773 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53774 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53775 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53776 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53777 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53778 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53779 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53780 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
root 53782 0.0 0.0 0 0 pts/0 Z 15:06 0:00 [setrlimit01] <defunct>
nobody 54290 0.0 0.0 0 0 pts/0 Z 15:07 0:00 [sysctl03] <defunct>
root 56813 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56814 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56815 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56816 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56817 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56818 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56819 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56820 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56821 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56822 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56823 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56825 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56826 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56827 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56828 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56829 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56830 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56831 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56832 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56833 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56834 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56835 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56836 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid03] <defunct>
root 56838 0.0 0.0 0 0 pts/0 Z 16:09 0:00 [waitpid04] <defunct>
sshd 58675 0.0 0.0 0 0 ? Z 17:21 0:00 [sshd] <defunct>