CONFIG_ORC_UNWINDER=y breaks get_wchan()?

From: Vito Caputo
Date: Tue Sep 21 2021 - 15:32:52 EST


Hi Josh (and CC:lkml),

I've recently transitioned to an Arch system which has
CONFIG_ORC_UNWINDER=y in the default kernel. My window manager
integrates process monitoring showing the wchans of processes, making
it very apparent when wchan breaks.

Glancing at the kernel code to see what's involved in get_wchan() for
x86, it looks to assume there are frame pointers in the stack. I
don't see any mention of ORC_UNWINDER in the get_wchan() code which
seems like an oversight when ORC_UNWINDER=y gets rid of them.

I had originally assumed this was just a Kconfig problem and asked
lkml about it (hearing crickets back) [0], but have since learned of
ORC_UNWINDER's existence via the Arch kernel maintainer.

Is this an oversight of the ORC_UNWINDER implementation? It's
arguably a regression to completely break wchans for tools like `ps -o
wchan` and `top`, or my window manager and its separate monitoring
utility. Presumably there are other tools out there sampling wchans
for monitoring as well, there's also an internal use of get_chan() in
kernel/sched/fair.c for sleep profiling.

I've occasionally seen when monitoring at a high sample rate (60hz) on
something churny like a parallel kernel or systemd build, there's a
spurious non-zero sample coming out of /proc/[pid]/wchan containing a
hexadecimal address like 0xffffa9ebc181bcf8. This all smells broken,
is get_wchan() occasionally spitting out random junk here kallsyms
can't resolve, because get_chan() is completely ignorant of
ORC_UNWINDER's effects?

My time to spend on this currently is very limited, but I'd like to at
least get the relevant parties aware if they're not already... Maybe
I should just file something in bugzilla.

Thanks,
Vito Caputo


[0] https://lore.kernel.org/lkml/20210914012612.vwlowt5wsojmyfzr@xxxxxxxxxxxxxxxxxxxxxxxx/