Re: Debugging kernel semaphore contention and priority inversion
From: Keith Mannthey
Date: Thu Aug 18 2005 - 12:39:50 EST
On 8/17/05, Davda, Bhavesh P (Bhavesh) <bhavesh@xxxxxxxxx> wrote:
> > From: Keith Mannthey [mailto:kmannth@xxxxxxxxx]
> > Sent: Wednesday, August 17, 2005 5:33 PM
> >
> > On 8/17/05, Davda, Bhavesh P (Bhavesh) <bhavesh@xxxxxxxxx> wrote:
> > > Is there a way to know which task has a particular (struct
> > semaphore
> > > *) down()ed, leading to another task's down() blocking on it?
> >
> > I would add a field to struct semaphore that tracks the
> > current process.
> > In your various up and downs have that field tracks the
> > "current" process.
>
> Yeah, I thought about that. Unfortunately, it doesn't meet my need for
> not Heisenberg'ing the system. I can't instrument the struct semaphore
> {} in a running system.
What kernel are you using?
Can you do some form of a crash dump (maybe some diskdump thing)?
It is hard to debug without insturmentation of some kind.... You are
most likely going to have to rebuild/change your current kernel to
sort this issue out....
> > This way you dump the semaphore you can see what task it is
> > holding it. Have the module dump the semaphore and you can
> > id the task
> >
> > > It would be helpful to get a kernel stacktrace for the culprit too.
> >
> > Have you tried sysrq t? See the Documentation/sysrq.txt file.
>
> This is a headless system.
How do you know you are spinning on some inode semaphore? If the
system is only headless how do you know you are dealing with some
priority inversion issue? Maybe the system has a panic or ????
It seems to me you might be jumping to conclusions.
> >
> > How stuck is the system?
> >
> > Keith
>
> Very. Only pingable, but can't login via telnet/ssh/anything. Reason is
> the same reason the low priority mystery task is unable to run and
> release the held semaphore.
From the present state you have described you would be unable to
load a module or interact with the box in anyway. It is really hard to
debug a kernel without a console. As others have suggested a serial
console/net console would help a bunch.
Good luck!
Keith
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/