Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter

From: Gao Xiang
Date: Thu Nov 28 2024 - 05:46:22 EST




On 2024/11/28 18:00, Max Kellermann wrote:
On Thu, Nov 21, 2024 at 2:18 PM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:
Just saw this. I guess your _recent_ 6.11.9 bug is actually
related to EROFS since EROFS uses readahead_expand(). I think
your recent report was introduced by a recent backport fix
commit 9e2f9d34dd12 ("erofs: handle overlapped pclusters out of crafted images properly")
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.11.9&id=9cfa199bcbbbba31cbf97b2786f44f4464f3f29a

bio can be NULL after this patch and causes
unbalanced psi_memstall_{enter,leave}(). It can be fixed as
(the diff below could be damaged due to my email client):

With your patch, the PSI warning (from Suren's debugging patch) fired
again last night. Which means there may be other instances of this bug
left.

Ok, but honestly that is somewhat strange, on the
EROFS side readahead_expand() can only be used in
.readahead() context and with the original
readahead_control.

I don't have more clues without more understanding
of psi memstall. From my limited POV, the callers
of psi_memstall_{enter,leave}() all seem good.

Thanks,
Gao Xiang


------------[ cut here ]------------
Stall from readahead_expand+0xca/0x1d0 was never cleared
WARNING: CPU: 133 PID: 91645 at kernel/sched/psi.c:989
psi_task_switch+0x126/0x230
Modules linked in:
CPU: 133 UID: 3221274747 PID: 91645 Comm: php-cgi8.1 Tainted: G
W 6.11.10-cm4all2-es+ #267
Tainted: [W]=WARN
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023
RIP: 0010:psi_task_switch+0x126/0x230
Code: f6 75 e6 41 f6 44 24 18 80 74 36 41 f6 84 24 d0 08 00 00 02 74
2b 49 8b b4 24 d8 08 00 00 48 c7 c7 20 c8 8d a8 e8 fa 1f f9 ff <0f> 0b
41 f6 44 24 18 80 74 0d 41 f6 84 24 d0 08 00 00 02 74 02 0f
RSP: 0018:ffff96be9c28b9a8 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 0000000000000085 RCX: 0000000000000027
RDX: ffff8997b995c8c8 RSI: 0000000000000001 RDI: ffff8997b995c8c0
RBP: 000000000000001c R08: 00000000ffff7fff R09: 0000000000000058
R10: 00000000ffff7fff R11: ffff899abd2a1000 R12: ffff891db3b85c00
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8997b9940000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f26d7aba480 CR3: 000000c07c61a006 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x93/0x140
? psi_task_switch+0x126/0x230
? report_bug+0x174/0x1a0
? handle_bug+0x53/0x90
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x16/0x20
? psi_task_switch+0x126/0x230
? psi_task_switch+0x126/0x230
__schedule+0x980/0x10f0
do_task_dead+0x3e/0x40
do_exit+0x6ed/0x970
do_group_exit+0x2c/0x80
__x64_sys_exit_group+0x14/0x20
x64_sys_call+0x15aa/0x17b0
do_syscall_64+0x64/0x100
? srso_alias_return_thunk+0x5/0xfbef5
? get_page_from_freelist+0x60e/0x1140
? cgroup_rstat_updated+0x88/0x210
? srso_alias_return_thunk+0x5/0xfbef5
? __mod_memcg_lruvec_state+0x91/0x140
? srso_alias_return_thunk+0x5/0xfbef5
? __lruvec_stat_mod_folio+0x80/0xd0
? srso_alias_return_thunk+0x5/0xfbef5
? folio_add_file_rmap_ptes+0x37/0xb0
? srso_alias_return_thunk+0x5/0xfbef5
? set_pte_range+0xb7/0x280
? srso_alias_return_thunk+0x5/0xfbef5
? next_uptodate_folio+0x83/0x270
? srso_alias_return_thunk+0x5/0xfbef5
? filemap_map_pages+0x4a2/0x590
? srso_alias_return_thunk+0x5/0xfbef5
? do_fault+0x291/0x4d0
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? __handle_mm_fault+0x31c/0x1060
? srso_alias_return_thunk+0x5/0xfbef5
? __count_memcg_events+0x53/0xf0
? srso_alias_return_thunk+0x5/0xfbef5
? handle_mm_fault+0xb6/0x280
? srso_alias_return_thunk+0x5/0xfbef5
? do_user_addr_fault+0x386/0x610
? srso_alias_return_thunk+0x5/0xfbef5
? exc_page_fault+0x6f/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f26dad48349
Code: Unable to access opcode bytes at 0x7f26dad4831f.
RSP: 002b:00007ffcd05a7848 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f26dae429e0 RCX: 00007f26dad48349
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: fffffffffffffd48 R09: 000055c238e82190
R10: 00007f26d8a781a8 R11: 0000000000000246 R12: 00007f26dae429e0
R13: 00007f26dae482e0 R14: 000000000000001e R15: 00007f26dae482c8
</TASK>
---[ end trace 0000000000000000 ]---