Re: [RFC PATCH] sched/core: Enhanced debug logs in do_task_dead()

From: Zhongqiu Han
Date: Wed Dec 11 2024 - 05:46:28 EST


On 12/10/2024 10:18 PM, Peter Zijlstra wrote:
On Tue, Dec 10, 2024 at 09:45:13PM +0800, Zhongqiu Han wrote:
If BUG() is a NOP, dump the problematic stack for debugging purposes.

Signed-off-by: Zhongqiu Han <quic_zhonhan@xxxxxxxxxxx>
---
If BUG() is a NOP, it should make sense for debugging purposes. However,
just arising the patch with RFC, because at least for now, I haven't found
a definition of BUG() as NOP in various architectures. Thanks~

Yeah, this don't make sense. If you want a stack-trace you shouldn't
have killed BUG.

And yeah, having done a quick peek, I don't see how you can kill BUG
these days other than explicitly modyfing your source, in which case you
get to keep the pieces.

Hi Peter,
Thanks a lot for your review and reply!

My initial thought was that if BUG() is a NOP, we should dump the stack
for debugging purposes before executing cpu_relax(), because according
to the current do_task_dead() code comments, BUG() might be a NOP in
some cases.
void __noreturn do_task_dead(void)
{
/* Causes final put_task_struct in finish_task_switch(): */
set_special_state(TASK_DEAD);

/* Tell freezer to ignore us: */
current->flags |= PF_NOFREEZE;

__schedule(SM_NONE);
BUG();

/* Avoid "noreturn function does return" - but don't continue if
BUG() is a NOP: */-------------------> This comments
for (;;)
cpu_relax();
}

For example, in the code _before_ 7/4/2014 commit a4b5d580e078 ('bug:
Make BUG() always stop the machine')., BUG() is indeed a NOP if (
!CONFIG_BUG && !HAVE_ARCH_BUG)

#else /* !CONFIG_BUG */
#ifndef HAVE_ARCH_BUG
-#define BUG() do {} while (0)
+#define BUG() do {} while (1)
#endif



However, since 10/4/2024 commit 5284984a4fba ('bug: Fix no-return
statement warning with !CONFIG_BUG'), with !CONFIG_BUG, the default
BUG() is implemented to call unreachable() to ensure it does not return,
and if all architecture-specific implementations of BUG() also do not
return, I now think my thought should be meaningless.

commit 5284984a4fba ('bug: Fix no-return-statement warning with !CONFIG_BUG')
#else /* !CONFIG_BUG */
#ifndef HAVE_ARCH_BUG
-#define BUG() do {} while (1)
+#define BUG() do { \
+ do {} while (1); \
+ unreachable(); \
+} while (0)
#endif


By the way, do you think it is reasonable to remove the cpu_relax code
by evaluating whether BUG() does not return in all architectures and
configurations? Additionally, _if_ some architecture-specific
implementations of BUG() do not have the capability to output the thread
stack, should we consider using the panic() function as a replacement?

Thanks for your time.



--
Thx and BRs,
Zhongqiu Han