Re: [PATCH] kernel: cpu: Handle hotplug failure for state CPUHP_AP_IDLE_DEAD

From: Thomas Gleixner
Date: Thu Sep 06 2018 - 04:10:14 EST


On Wed, 5 Sep 2018, Prakruthi Deepak Heragu wrote:

> Once the tear down hotplug handler is run, cpu is dead and enters
> into CPUHP_AP_IDLE_DEAD state. Any callbacks that fail in the state
> machine with state < CPUHP_AP_IDLE must be treated as fatal as this
> could result into timer not beig migrated away from dead cpu and run
> into issues like work queue lock ups, sched_clock timer wrapping to
> zero as sched_clock_poll which is in the hrtimer base of cpu being
> hotplugged does not get migrated.

BUG_ON() is the last resort when there is no other way out. And there is no
reason to treat such a failure as fatal unconditionally.

Why would any of those callback fail at all? And if that ever happens, then
we really can be smarter than just giving up.

Thanks,

tglx