Re: jack, PREEMPT_DESKTOP, delayed interrupts?

From: Ingo Molnar
Date: Wed Aug 31 2005 - 02:35:08 EST

Next message: Tony Lindgren: "Re: Dynamic tick for 2.6.14 - what's the plan?"
Previous message: Jens Axboe: "Re: [PATCH] blk queue io tracing support"
In reply to: Fernando Lopez-Lezcano: "Re: jack, PREEMPT_DESKTOP, delayed interrupts?"
Next in thread: Jack O'Quin: "Re: [Jackit-devel] Re: jack, PREEMPT_DESKTOP, delayed interrupts?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Fernando Lopez-Lezcano <nando@xxxxxxxxxxxxxxxxxx> wrote:

> Do a "tar cvf usr.tar /usr" just to read/write a lot to disk (this
> within the same SATA disk). Watch memory being used in a system
> monitor applet up to 100%. After a while, hard to say how long (maybe
> 10/15 minutes?) the system eventually can get into a state where Jack
> starts printing messages of the type "delay of 3856.000 usecs exceeds
> estimated spare time of 2653.000; restart ..." (if I understand
> correctly this means interrupts are being delayed on their way to
> Jack, or at least Jack thinks they are arriving too late), along with
> some less frequent xun notices.
>
> Now the strange thing is that this condition seems to be persistent.
> Nothing I do after it starts to happen seems to halt those messages.
> Including stopping Jack and starting it again, and even (tried it
> once) stopping the alsa sound driver and loading it again. Nothing out
> of the ordinary in dmesg or /var/log/messages. I would guess that
> something "breaks" inside the kernel with regards to interrupt
> handling and/or whatever Jack uses to measure time inside the kernel?
> Interrupts are prioritized correctly (rtc, then audio and jack runs at
> lower realtime priority than the audio interrupts), everything else
> looks fine.

are the messages unstoppable, even if the system is completely idle? And
you get this only with the SMP kernel, correct?

i dont know what's going on, but here are a couple of ideas to debug
this further:

- could you check whether there are any SCHED_FIFO tasks that shouldnt
be there:

ps -meo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm | grep FF

we had bugs in the past that caused RT priorities to 'leak' on SMP
systems.

- please enable all latency tracing options like Lee suggested, and do
"echo 0 > /proc/sys/kernel/preempt_max_latency" to get some traces.

- if everything fails and automatic latency tracing does not show
anything out of ordinary, then you could try to do "user-triggered
tracing" of jackd's critical path. This is more laborous to do, but
should pinpoint the latency reason in a pretty sure way.

user-triggered tracing is done via adding those special gettimeofday
calls to jackd and recompiling jackd. I've attached an older hack
against jackd below, you might need to merge it to recent jackd. NOTE:
this patch will only work if you are getting xrun messages from ALSA.
It has to be reworked if your latencies are not actual xruns.

then, once the patched jackd is running, you can enable user-triggered
tracing via:

echo 1 > /proc/sys/kernel/trace_user_triggered

then you should start seeing jackd's latency path measured, not the
kernel's internal critical sections. [ Tracing overhead tip: for
user-triggered tracing it is enough to have CONFIG_WAKEUP_TIMING and
CONFIG_LATENCY_TRACE enabled, the other debugging options can be
turned off, to minimize overhead. ]

Ingo

diff -dupPNr jackit.0/drivers/alsa/alsa_driver.c jackit.1/drivers/alsa/alsa_driver.c
--- jackit.0/drivers/alsa/alsa_driver.c 2004-11-26 17:24:24.000000000 +0000
+++ jackit.1/drivers/alsa/alsa_driver.c 2004-11-30 13:41:24.521791456 +0000
@@ -1077,13 +1077,16 @@ alsa_driver_xrun_recovery (alsa_driver_t
&& driver->process_count > XRUN_REPORT_DELAY) {
struct timeval now, diff, tstamp;
driver->xrun_count++;
+ gettimeofday(0,0);
gettimeofday(&now, 0);
snd_pcm_status_get_trigger_tstamp(status, &tstamp);
timersub(&now, &tstamp, &diff);
*delayed_usecs = diff.tv_sec * 1000000.0 + diff.tv_usec;
+#if 0
fprintf(stderr, "\n\n**** alsa_pcm: xrun of at least %.3f "
"msecs\n\n",
*delayed_usecs / 1000.0);
+#endif
}

if (alsa_driver_stop (driver) ||
@@ -1185,6 +1188,12 @@ alsa_driver_wait (alsa_driver_t *driver,
nfds++;
}

+ { // Trigger off trace every other poll...
+ static unsigned int xruntrace_count = 0;
+ if ((xruntrace_count++ % 1) == 0)
+ gettimeofday(0,1);
+ }
+
poll_enter = jack_get_microseconds ();

if (poll_enter > driver->poll_next) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tony Lindgren: "Re: Dynamic tick for 2.6.14 - what's the plan?"
Previous message: Jens Axboe: "Re: [PATCH] blk queue io tracing support"
In reply to: Fernando Lopez-Lezcano: "Re: jack, PREEMPT_DESKTOP, delayed interrupts?"
Next in thread: Jack O'Quin: "Re: [Jackit-devel] Re: jack, PREEMPT_DESKTOP, delayed interrupts?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]