[PATCH] tick, broadcast: Prevent false alarm when force mask contains offline cpus

From: Preeti U Murthy
Date: Wed Mar 26 2014 - 00:01:08 EST


Its possible that the tick_broadcast_force_mask contains cpus which are not
in cpu_online_mask when a broadcast tick occurs. This could happen under the
following circumstance assuming CPU1 is among the CPUs waiting for broadcast.

CPU0 CPU1

Run CPU_DOWN_PREPARE notifiers

Start stop_machine Gets woken up by IPI to run
stop_machine, sets itself in
tick_broadcast_force_mask if the
time of broadcast interrupt is around
the same time as this IPI.

Start stop_machine
set_cpu_online(cpu1, false)
End stop_machine End stop_machine

Broadcast interrupt
Finds that cpu1 in
tick_broadcast_force_mask is offline
and triggers the WARN_ON in
tick_handle_oneshot_broadcast()

Clears all broadcast masks
in CPU_DEAD stage.

This WARN_ON was added to capture scenarios where the broadcast mask, be it
oneshot/pending/force_mask contain offline cpus whose tick devices have been
removed. But here is a case where we trigger the warn on in a valid scenario.

One could argue that the scenario is invalid and ought to be warned against
because ideally the broadcast masks need to be cleared of the cpus about to
go offine before clearing them in the online_mask so that we dont hit these
scenarios.

This would mean clearing the masks in CPU_DOWN_PREPARE stage. But
it is quite possible that this stage itself will fail and cpu hotplug will
not go through. We would then end up in a situation where the cpu has not gone
offline, and continues to wait for the broadcast interrupt like before.
However it is cleared in the broadcast masks and this interrupt will never
be delivered. Hence clearing of masks is best kept off until we are sure that
the cpu is dead, i.e. in the CPU_DEAD stage.

Hence simply ensure that the tick_broadcast_force_mask is a subset of the
online cpus to take care of rare occurences such as above. Moreover this is
not a harmful scenario where the cpu is in the mask but its tick device was
shutdown. The WARN_ON will then continue to capture cases where we could
possibly cause a kernel crash.

Signed-off-by: Preeti U Murthy <preeti@xxxxxxxxxxxxxxxxxx>
---

kernel/time/tick-broadcast.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 63c7b2d..30b8731 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -606,7 +606,12 @@ again:
*/
cpumask_clear_cpu(smp_processor_id(), tick_broadcast_pending_mask);

- /* Take care of enforced broadcast requests */
+ /* Take care of enforced broadcast requests. We could have offline
+ * cpus in the tick_broadcast_force_mask. Thats ok, we got the interrupt
+ * before we could clear the mask.
+ */
+ cpumask_and(tick_broadcast_force_mask,
+ tick_broadcast_force_mask, cpu_online_mask);
cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
cpumask_clear(tick_broadcast_force_mask);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/