Re: [patch] IDE problems on SMP, fixed? (fwd)

Linus Torvalds (
Wed, 29 Jul 1998 22:57:42 -0700 (PDT)

On Wed, 29 Jul 1998, MOLNAR Ingo wrote:
> the lockup happens somewhere after we enter smp_apic_timer_interrupt(). I
> suspect it's the irq_enter() within smp_local_timer_interrupt().

irq_enter() would certainly be a top suspect, yes. It's one of the few
things that can easily wait forever if something goes wrong.

[ This btw shows a layering thing - we _should_ do the irq_enter() inside
the smp_apic_timer_interrupt() function before actually calling the
smp_local_timer_interrupt() thing - because smp_local_timer_interrupt()
is also called from the "old-fashioned" timer irq handler when we didn't
get the SMP IO-APIC stuff set up correctly, and in that case we'd do an
"irq_enter()" twice - which is harmless but still not the right thing to
do conceptually ]

Hanging in irq_enter() _tends_ to mean that the local CPU has done a
global interrupt disable, and then enabled interrupts locally. BOOM. That
would certainly cause lockups, although I don't see why this would be new
behaviour: that would have been a lock-up problem since fairly long ago.

This is fairly easy to check on: you can make "__sti" check that we don't
have the global IRQ-lock enabled by doing something like

#define GETEIP() ({ unsigned long eip; \
asm volatile("movl $1f,%0\n1:":"=g" (eip)); \
eip; })

#define __sti() do { \
if (global_irq_holder == smp_processor_id()) \
printk("__sti at %08lx\n", GETEIP()); \
asm volatile("sti": : :"memory"); \
} while (0)

which should catch any cases where we illegally enable interrupts while
still holding the interrupt lock.

[ And on to other things ]

Ingo: wrt the new locking code.. If you (or somebody else) can shoot any
holes in this, holler.

[ Btw, I added code to the big kernel lock that checks that the lock is
always aquired with interrupts enabled (locally or globally). I've run a
kernel that would panic if interrupts were ever disabled upon trying to
access the global kernel lock, and it's been up so far, under both heavy
load and me trying to find some other way to crash it. My sanity tests
have not found a single place where we try to aquire the lock in an
interrupt context (irq or bh) or with interrupts disabled, so everything
looks fine and checks out so far. ]

The other new thing with the new kernel lock code is that the "lock_depth"
variable is no longer accessed atomically. The old code used to do atomic
increment and decrement operations, but as the lock_depth is entirely
local to one specific thread, it is never accessed from multiple CPU's at
the same time, and the only way the process can move from one CPU to
another is by doing a context switch (two of them, in fact), and the
context switch will force a synchronization point through using the
spinlocks. As such, doing an atomic access looked like a waste of time to

So as far as I can tell, the new code is not only a lot simpler and more
elegant, it's also "obviously correct". But if you see hangs with 2.1.112
that you didn't see earlier, it's still one of the few things that
changed, so I'd appreciate another pair of eyes looking at the "obviously
correct" code ;)


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at