Re: BUG_ON(!cpus_equal(cpumask, tmp));

From: Andy Whitcroft
Date: Wed Mar 31 2004 - 19:36:20 EST

I think we all agree that the reason that we need Hari's change
is that the underlying cpu shutdown code is not clearing out all
references to the cpu from the VM system; and that in an ideal
world we should fix that. But this is v2.6 and we need something
which fixes the issue. Hari's patch is a very good starting point,
as it closes the window to almost nothing. But there is still a
small window in which we can end up hung during shutdown if the
timing is wrong as cpu_online_map can change whilst we are in
the process of sending an IPI. I think we also agree that we
don't want to add any additional active locking to VM hot path
for the remote tlb invalidate if at all possible.

In a previous email I outlined a possible solution in which we
change shutdown to guarentee that if you are in the middle of
sending an IPI all cpus online at the start of that IPI will
respond, even if they are in the process of going offline.

Below is a patch to 2.6.5-rc3 which implements this idea, subtly
modifying Hari's solution and incorporating Martins changes.
I have added copious comments as we are essentially open-coding
RCU semantics.

I have only tested this on i386 and only shutdown (!) tested it.
If there is anyone who can reliably reproduce this issue could
try this patch I would appreciate it.

Review/Comments always appreciated.


smp.c | 47 +++++++++++++++++++++++++++++++++++++----------
1 files changed, 37 insertions(+), 10 deletions(-)

diff -X /home/apw/lib/vdiff.excl -rupN reference/arch/i386/kernel/smp.c current/arch/i386/kernel/smp.c
--- reference/arch/i386/kernel/smp.c 2004-03-11 20:47:12.000000000 +0000
+++ current/arch/i386/kernel/smp.c 2004-04-01 02:01:11.000000000 +0100
@@ -355,8 +355,6 @@ static void flush_tlb_others(cpumask_t c

- cpus_and(tmp, cpumask, cpu_online_map);
- BUG_ON(!cpus_equal(cpumask, tmp));
BUG_ON(cpu_isset(smp_processor_id(), cpumask));

@@ -367,16 +365,24 @@ static void flush_tlb_others(cpumask_t c
* detected by the NMI watchdog.
+ /* Subtle, mask the request mask with the currently online cpu's.
+ * Sample this under the lock; cpus in the the middle of going
+ * offline will wait until there is noone in this critical section
+ * before disabling IPI handling. */
+ cpus_and(tmp, cpumask, cpu_online_map);
+ if(cpus_empty(tmp))
+ return;

flush_mm = mm;
flush_va = va;
- atomic_set_mask(cpumask, &flush_cpumask);
+ atomic_set_mask(tmp, &flush_cpumask);
int k;
unsigned long *flush_mask = (unsigned long *)&flush_cpumask;
- unsigned long *cpu_mask = (unsigned long *)&cpumask;
+ unsigned long *cpu_mask = (unsigned long *)&tmp;
for (k = 0; k < BITS_TO_LONGS(NR_CPUS); ++k)
atomic_set_mask(cpu_mask[k], &flush_mask[k]);
@@ -385,7 +391,7 @@ static void flush_tlb_others(cpumask_t c
* We have to send the IPI only to
* CPUs affected.
- send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
+ send_IPI_mask(tmp, INVALIDATE_TLB_VECTOR);

while (!cpus_empty(flush_cpumask))
/* nothing. lockup detection does not belong here */
@@ -514,11 +520,8 @@ int smp_call_function (void (*func) (voi
struct call_data_struct data;
- int cpus = num_online_cpus()-1;
- if (!cpus)
- return 0;
+ int cpus;
data.func = func; = info;
atomic_set(&data.started, 0);
@@ -527,6 +530,15 @@ int smp_call_function (void (*func) (voi
atomic_set(&data.finished, 0);

+ /* Subtle, get the current number of online cpus.
+ * Sample this under the lock; cpus in the the middle of going
+ * offline will wait until there is noone in this critical section
+ * before disabling IPI handling. */
+ cpus = num_online_cpus()-1;
+ if (!cpus)
+ goto out_unlock;
call_data = &data;

@@ -540,6 +552,7 @@ int smp_call_function (void (*func) (voi
if (wait)
while (atomic_read(&data.finished) != cpus)

return 0;
@@ -551,6 +564,20 @@ static void stop_this_cpu (void * dummy)
* Remove this CPU:
cpu_clear(smp_processor_id(), cpu_online_map);
+ /* Subtle, IPI users assume that they will be able to get IPI's
+ * though to the cpus listed in cpu_online_map. To ensure this
+ * we add the requirement that they check cpu_online_map within
+ * the IPI critical sections. Here we remove ourselves from the
+ * map, then ensure that all other cpus have left the relevant
+ * critical sections since the change. We do this by aquiring
+ * the relevant section locks, if we have them none else is in
+ * them. Once this is done we can go offline. */
+ spin_lock(&tlbstate_lock);
+ spin_unlock(&tlbstate_lock);
+ spin_lock(&tlbstate_lock);
+ spin_unlock(&tlbstate_lock);
if (cpu_data[smp_processor_id()].hlt_works_ok)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at