BUG in smp_flush_tlb()

Claus-Justus Heine (claus@momo.math.rwth-aachen.de)
25 Jul 1998 19:24:48 +0200


I think I have found a bug in smp_flush_tlb().

Symptoms:
~~~~~~~~~
Between ten and thirty

"stuck on smp_invalidate_needed ..."

messages after the second CPU has been fired off.

Reason:
~~~~~~~
smp_flush_tlb() sets the

smp_invalidate_needed

mask to "cpu_present_map". It then calls
smp_message_pass(MSG_ALL_BUT_SELF, ) to propagate the flush_tlb()
request to all other CPUs and then calls local_flush_tlb().

The problem is that smp_flush_tlb() doesn't clear the bit belonging to
the local CPU itself in "smp_invalidate_needed". As soon as the other
CPUs have been fired off, this is no longer a problem as
smp_message_pass() then clears the bit.

BUT: smp_flush_tlb() is called two or three times before the other
CPUs have been fired off. In this case smp_message_pass() is a no-op,
and also doesn't clear the bit of the local CPU in smp_invalidate_needed.

This causes some "stuck on smp_invalidate_needed ..." messages. The
problem eventually "fixes" itself when either of the following things
happens:

a) somebody calls smp_flush_tlb() or otherwise sends an
MSG_INVALIDATE_TLB message

b) the first CPU enters an irq context while another is already
executing an interrupt, in which case get_irqlock() and
wait_on_irq() eventually will call check_smp_invalidate().

The patch below fixes the problem by simply clearing the bit belonging
to the local processor in smp_flush_tlb()

Cheers

Claus

########################################################################
--- linux-2.1/arch/i386/kernel/smp.c.old Sat Jul 25 18:58:50 1998
+++ linux-2.1/arch/i386/kernel/smp.c Sat Jul 25 19:00:18 1998
@@ -1395,6 +1395,7 @@
*/

local_flush_tlb();
+ clear_bit(smp_processor_id(), &smp_invalidate_needed);

__restore_flags(flags);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html