On my Tecra 730CDT (Pentium 150, 80 Meg Ram, Redhat 2.x), I've added
some routines to a 2.0.27 kernel that use a *kernel* semaphore.
Using stock 2.0.27 kernel, is it a known bug for a kernel semaphore to
get into a state where COUNT is 1, WAITING is 0, and one or more
processes in its WAIT queue?
I'm getting exactly this, and I'm wondering if somehow I'm either
somehow overwriting parts of my semaphore structure, or there's a bug
in the kernel thats not mine.
If this is a known bug, does the new semaphore code I find in 2.1.36+
fix this? (It seems to be fixing SMP problems, not this one...)
-- Perry Wagle (wagle@cse.ogi.edu)
PS -- My testing setup has four user TK script processes doing
ioctl()'s to a char device driver that calls my routines. I can tell
when those processes block when the button just sits there, depressed.
Normally, the whole thing works, but occasionally, and not reliably
reproducably (ie, if I write a sequence down, then it no longer
triggers the bug), I found I could get 1 or 2 processes blocked on the
semaphore, and that an UP had been ignored.
Interestingly, I could knock a blocked processes Q1 and Q2 loose with
two more processes, X and Y. X does a DOWN, doesn't block. Y does a
DOWN, and blocks. X does an UP, and Q1 wakes up. Q1 does an UP, and
Q2 wakes up. Q2 does an UP, and Y wakes up. Finally, Y does an UP,
and the semaphore is back to how it should be.