Re: [Fwd: problem involving wait_on_bh]

Rogier Wolff (R.E.Wolff@BitWizard.nl)
Sun, 19 Dec 1999 15:26:06 +0100 (MET)


Alan Cox wrote:

[hanging machines... ]
> and my guess would be it doing a
> disable_irq() while holding a lock used in the irq handler

My experience with locks and SMP is that you hang the machine if you
forget to unlock a lock (e.g. because of an error exit) and an
interrupt (trying to aquire the lock) happens.

On a non-SMP machine, kernel compiled for SMP locks up
immediately. SMP machines may continue to run on one CPU for a while
before locking up completely (i.e. hitting the same spinlock with the
remaining CPU).

I've seen an Alpha machine report "stuck spinlock" or something like
that. I have done some staring at the spin_lock function in
asm/spinlock.h this week and found it impossible to follow. Today I
tried again and got a bit furhter :-)... Now this is blindingly fast
code, which is performance critical, so it is acceptable to have it
a bit hard-to-follow.

#define spin_lock_string \
"\n1:\t" \
"lock ; btsl $0,%0\n\t" \
"jc 2f\n" \
".section .text.lock,\"ax\"\n" \
"2:\t" \
"testb $1,%0\n\t" \
"jne 2b\n\t" \
"jmp 1b\n" \
".previous"

This creates two pieces of code. One "inline" in the code, and one
"out of the way", in the .text.lock segment.

inline:

1: lock;btsl $0, lock
jc 2f

out-of-line:

2: testb $1, lock
jne 2b
jmp 1b

Now how about if we would change the out-of-line part to:

2: mov $0, %ecx
3: inc %ecx
je 4f
testb $1, lock
jne 3b
jmp 1b
4: push lock
call _breaking
jmp 1b

OK, it's a bit longer, it takes about two or three times longer to
notice the "unlocked" state, and it uses an extra register. But this
one causes the lock to be broken with a call to the "breaking"
function, instead of hanging the machine. This would make debugging
locking/hanging issues a lot easier: You'd get a message stating which
lock got messed up....

Oh, the ecx counts to 4G, which at 400MHz is 10 seconds. If the loop
doesn't run at around 1 clock per loop, the time may become
unacceptable, so we'll have to preload ecx with a different value.
(How about loading _bogomips (and decrementing) there? -> 1, 2 or 3
seconds delay before the lock is broken!)

I've written 6502, 8080, 8086, 68000, sparc and "move" assembly, and
it's been a while. So, I'll be mixing everything up. If really
neccesary, I'll be able to get something working, but I'd appreciate
help from others in getting this in "working" order.

Roger.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
 "I didn't say it was your fault. I said I was going to blame it on you."

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/