Re: [PATCH 5/5] timers: Tag (hr)timer softirq as hotplug safe

From: Joel Fernandes
Date: Fri Sep 15 2023 - 21:39:54 EST


On Tue, Sep 12, 2023 at 6:44 AM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
>
> Specific stress involving frequent CPU-hotplug operations, such as
> running rcutorture for example, may trigger the following message:
>
> "NOHZ tick-stop error: local softirq work is pending, handler #02!!!"
>
> This happens in the CPU-down hotplug process, after
> CPUHP_AP_SMPBOOT_THREADS whose teardown callback parks ksoftirqd, and
> before the target CPU shuts down through CPUHP_AP_IDLE_DEAD. In this
> fragile intermediate state, softirqs waiting for threaded handling may
> be forever ignored and eventually reported by the idle task as in the
> above example.
>
> However some vectors are known to be safe as long as the corresponding
> subsystems have teardown callbacks handling the migration of their
> events. The above error message reports pending timers softirq although
> this vector can be considered as hotplug safe because the
> CPUHP_TIMERS_PREPARE teardown callback performs the necessary migration
> of timers after the death of the CPU. Hrtimers also have a similar
> hotplug handling.
>
> Therefore this error message, as far as (hr-)timers are concerned, can
> be considered spurious and the relevant softirq vectors can be marked as
> hotplug safe.

We could:
Cc: stable@xxxxxxxxxxxxxxx

Since hell is breaking loose a bit because of:
https://lore.kernel.org/all/20230831133214.XF2yjiEb@xxxxxxxxxxxxx/T/

Reviewed-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>

thanks,

- Joel


> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> ---
> include/linux/interrupt.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index a92bce40b04b..4a1dc88ddbff 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -569,8 +569,12 @@ enum
> * 2) rcu_report_dead() reports the final quiescent states.
> *
> * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
> + *
> + * _ (HR)TIMER_SOFTIRQ: (hr)timers_dead_cpu() migrates the queue
> */
> -#define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ))
> +#define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(TIMER_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ) |\
> + BIT(HRTIMER_SOFTIRQ) | BIT(RCU_SOFTIRQ))
> +
>

Perhaps

> /* map softirq index to softirq name. update 'softirq_to_name' in
> * kernel/softirq.c when adding a new softirq.
> --
> 2.34.1
>