Re: BUG_ON(!cpus_equal(cpumask, tmp));
From: Hariprasad Nellitheertha
Date: Tue Mar 30 2004 - 08:30:09 EST
We faced this problem starting 2.6.3 while working on kexec.
The problem is because we now initialize cpu_vm_mask for init_mm with
CPU_MASK_ALL (from 2.6.3 onwards) which makes all bits in cpumask 1 (on SMP).
Hence BUG_ON(!cpus_equal(cpumask,tmp) fails. The change to set
cpu_vm_mask to CPU_MASK_ALL was done to remove tlb flush optimizations
I had posted a patch for this in the earlier thread. Reposting the same
here. This patch removes the assertion and uses "tmp" instead of cpumask.
Otherwise, we will end up sending IPIs to offline CPUs as well.
diff -Naur linux-2.6.3-before/arch/i386/kernel/smp.c linux-2.6.3/arch/i386/kernel/smp.c
--- linux-2.6.3-before/arch/i386/kernel/smp.c 2004-02-18 09:27:15.000000000 +0530
+++ linux-2.6.3/arch/i386/kernel/smp.c 2004-03-04 14:16:43.000000000 +0530
@@ -356,7 +356,8 @@
cpus_and(tmp, cpumask, cpu_online_map);
- BUG_ON(!cpus_equal(cpumask, tmp));
@@ -371,12 +372,12 @@
flush_mm = mm;
flush_va = va;
#if NR_CPUS <= BITS_PER_LONG
- atomic_set_mask(cpumask, &flush_cpumask);
+ atomic_set_mask(tmp, &flush_cpumask);
unsigned long *flush_mask = (unsigned long *)&flush_cpumask;
- unsigned long *cpu_mask = (unsigned long *)&cpumask;
+ unsigned long *cpu_mask = (unsigned long *)&tmp;
for (k = 0; k < BITS_TO_LONGS(NR_CPUS); ++k)
@@ -385,7 +386,7 @@
* We have to send the IPI only to
* CPUs affected.
- send_IPI_mask(cpumask, INVALIDATE_TLB_VECTOR);
+ send_IPI_mask(tmp, INVALIDATE_TLB_VECTOR);
/* nothing. lockup detection does not belong here */
On Mon, Mar 29, 2004 at 04:25:55PM -0800, Andrew Morton wrote:
> Andrew Morton <akpm@xxxxxxxx> wrote:
> > "Martin J. Bligh" <mbligh@xxxxxxxxxxx> wrote:
> > >
> > > The bug is on "BUG_ON(!cpus_equal(cpumask, tmp));" in flush_tlb_others
> > > This was from 2.6.1-rc1-mjb1, and seems to be a race on shutdown
> > > (prints after "Power Down"), but I've no reason to believe it's specific
> > > to the patchset I have - it's not an area I'm touching, I think.
> > >
> > > I presume we've got a race between taking CPUs offline and the
> > > tlbflush code ... tlb_flush_mm reads the value from mm->cpu_vm_mask,
> > > and then presumably some other cpu changes cpu_online_map before it
> > > gets to calling flush_tlb_others ... does that sound about right?
> > Looks like it, yes. I don't think there's a sane way of fixing that - we'd
> > need the flush_tlb_others() caller to hold some lock which keeps the cpu
> > add/remove code away.
> > I'd propose removing the assertion?
> If the going-away CPU can still take IPIs we're OK, I think. If it cannot,
> we have a problem. Can you do a s/BUG_ON/WARN_ON/ and run with that for a
> while? Check that the warnings come out and the machine doesn't go crump?
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Linux Technology Center
India Software Labs
IBM India, Bangalore
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/