Stuck looping on list_empty(list) in free_pcppages_bulk()
From: Sultan Alsawaf
Date: Mon Aug 30 2021 - 19:12:58 EST
Hi,
I was recently given the following CPU stall splat and asked to look into it:
----------------8<----------------
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
rcu: 44-...0: (1 GPs behind) idle=77e/1/0x4000000000000000 softirq=28480656/28480657 fqs=279939
(detected by 62, t=1140032 jiffies, g=224165577, q=8883218)
Sending NMI from CPU 62 to CPUs 44:
NMI backtrace for cpu 44
CPU: 44 PID: 83957 Comm: perl Tainted: G L 5.8.18-100.fc31.x86_64 #1
RIP: 0010:free_pcppages_bulk+0x63/0x2a0
RSP: 0018:ffffb3078698fb60 EFLAGS: 00000086
RAX: ffff8b647db30390 RBX: ffffee5fcab67f48 RCX: ffffee5f30c79980
RDX: 0000000000c31e66 RSI: 0000000000000007 RDI: 0000000000000007
RBP: 0000000000000000 R08: ffffb3078698fb80 R09: ffffb3078698fb80
R10: 00000000002ffa93 R11: 0000000000000000 R12: ffff8b647db30390
R13: 0000000000000000 R14: ffff8b647db30380 R15: 000000006a340084
FS: 0000000000000000(0000) GS:ffff8b647db00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff2b5200fa8 CR3: 00000034ada0a000 CR4: 0000000000340ee0
Call Trace:
free_unref_page_list+0x113/0x1a0
release_pages+0x3ad/0x450
tlb_flush_mmu+0x36/0x160
unmap_page_range+0xab6/0xee0
unmap_vmas+0x6a/0xd0
exit_mmap+0x97/0x170
mmput+0x61/0x140
do_exit+0x306/0xb80
? syscall_trace_enter+0x160/0x290
do_group_exit+0x3a/0xa0
__x64_sys_exit_group+0x14/0x20
do_syscall_64+0x4d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---------------->8----------------
I apologize in advance for reporting a bug on an EOL kernel. I don't see any
changes as of 5.14 that could address something like this, so I'm emailing in
case whatever happened here may be a bug affecting newer kernels.
With gdb, it appears that the CPU got stuck in the list_empty(list) loop inside
free_pcppages_bulk():
----------------8<----------------
do {
batch_free++;
if (++migratetype == MIGRATE_PCPTYPES)
migratetype = 0;
list = &pcp->lists[migratetype];
} while (list_empty(list));
---------------->8----------------
Although this code snippet is slightly different in 5.14, it's still ultimately
the same. Side note: I noticed that the way `migratetype` is incremented causes
`&pcp->lists[1]` to get looked at first rather than `&pcp->lists[0]`, since
`migratetype` will start out at 1. This quirk is still present in 5.14, though
the variable in question is now called `pindex`.
With some more gdb digging, I found that the `count` variable was stored in %ESI
at the time of the stall. According to register dump in the splat, %ESI was 7.
It looks like, for some reason, the pcp count was 7 higher than the number of
pages actually present in the pcp lists.
I tried to find some way that this could happen, but the only thing I could
think of was that maybe an allocation had both __GFP_RECLAIMABLE and
__GFP_MOVABLE set in its gfp mask, in which case the rmqueue() call in
get_page_from_freelist() would pass in a migratetype equal to MIGRATE_PCPTYPES
and then pages could be added to an out-of-bounds pcp list while still
incrementing the overall pcp count. This seems pretty unlikely though. As
another side note, it looks like there's nothing stopping this from occurring;
there's only a VM_WARN_ON() in gfp_migratetype() that checks if both bits are
set.
Any ideas on what may have happened here, or a link to a commit that may have
fixed this issue in newer kernels, would be much appreciated.
Thanks,
Sultan