RE: Debugging kernel semaphore contention and priority inversion
From: Steven Rostedt
Date: Thu Aug 18 2005 - 10:17:41 EST
On Thu, 2005-08-18 at 08:50 -0600, Davda, Bhavesh P (Bhavesh) wrote:
> > >This is a headless system.
> >
> > You could try netconsole.
>
> Haven't heard of it before. Will look into it. But I doubt it will help
> pinpoint the semaphore holder, if all I can do is sysrq stuff.
Or does this system have a serial? This is just as good (if not better)
than netconsole, since it is simpler, and netconsole still needs to use
the IP stack. Although, in most cases netconsole works for me. But with
a serial, you can also send commands. Netconsole is better if you need
to send lots of data, and need a higher speed data transfer rate.
I usually set up a minicom attached to the target computer, and on the
target run "cat /dev/ttyS0 > /dev/null &" just to open a port. This is
needed, since the reading of the serial will only be done if something
is actually reading from it. On the kernel command line you will also
need to add, console=ttyS0,115200N8 (change the baud to whatever, since
here I use 115200). From minicom, you can send a sysrq-t with C-a f t.
The C-a f sends a break, and the t tells the kernel this is a sysrq-t.
Read Documentation/serial-console.txt for more information.
For netconsole read: Documentation/networking/netconsole.txt
>
> >
> > > >
> > > > How stuck is the system?
> > > >
> > > > Keith
> > >
> > >Very. Only pingable, but can't login via
> > telnet/ssh/anything. Reason is
> > >the same reason the low priority mystery task is unable to run and
> > >release the held semaphore.
> >
> > (hmm. I'm obviously missing some original context here)
> >
> > Sounds like there must be another player who is RT prio + spinning.
> >
> > -Mike
>
> Very good! Yes, I left out that piece of detail in my original posting.
> There is a real low priority (4) SCHED_FIFO (hence still higher than any
> SCHED_OTHER) task spinning. But it is not the semaphore holder. I am
> trying to identify which kernel thread (because that's most likely)
> running at SCHED_OTHER real low priority (too nice) is holding the
> semaphore, locking out a priority 50 SCHED_FIFO task in its sys_write()
> as a result.
Also have a look at Ingo Molnar's RT patch. It takes care of priority
inversion (with priority inheritance) and also has lots of other nifty
features to debug semaphores and locks.
You can find it here:
http://people.redhat.com/mingo/realtime-preempt/
-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/