Re: [PATCH RFC] panic: Avoid extra noisy messages due to stopped cpus

From: Peter Zijlstra
Date: Thu Oct 11 2018 - 05:35:15 EST


On Thu, Oct 11, 2018 at 03:17:48PM +0800, Feng Tang wrote:
> Sometimes when debugging kernel panic, we saw many extra noisy error
> messages after the expected end:
>
> [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> [ 35.749975] ------------[ cut here ]------------
>
> These messages may overflow the sceen (framebuffer) and make debugging
> much difficulter.

*blink* you actually using the framebuffer for debugging ?! Why the heck
are you doing that?

> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index c93fcfd..b703862 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -520,6 +520,7 @@ void stop_this_cpu(void *dummy)
> * Remove this CPU:
> */
> set_cpu_online(smp_processor_id(), false);
> + set_cpu_active(smp_processor_id(), false);
> disable_local_APIC();
> mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>

WTH is stop_this_cpu() and how do we even get here with active still
set?

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7fc4a37..cf41b7b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9034,7 +9034,7 @@ static inline int find_new_ilb(void)
> {
> int ilb = cpumask_first(nohz.idle_cpus_mask);
>
> - if (ilb < nr_cpu_ids && idle_cpu(ilb))
> + if (ilb < nr_cpu_ids && idle_cpu(ilb) && cpu_online(ilb))
> return ilb;
>
> return nr_cpu_ids;


Similar, this is the result of taking the CPU away without going through
the normal path. You're doing something dodgy.