Re: realtime-preempt-2.6.12-final-V0.7.51-11 glitches [no more]

From: Ingo Molnar
Date: Fri Jul 08 2005 - 04:58:39 EST



* Rui Nuno Capela <rncbc@xxxxxxxxx> wrote:

> OTOH, I'll take this chance to show you something that is annoying me
> for quite some time. Just look to the attached chart where I've marked
> the spot with an arrow and a question mark. Its just one example of a
> strange behavior/phenomenon while running the jack_test4.2 test on
> PREEMPT_RT kernels: the CPU usage, which stays normally around 50%,
> suddenly jumps to 60% steady, starting at random points in time but
> always some time after the test has been started. Note that this
> randomness surely adds to the the slight differences found on the
> above results.

how long does this condition persist? Firstly, please upgrade to the
-51-16 kernel, previous kernels had a condition where interrupt storms
(or repeat interrupts) could occur. (Your irqs/sec values dont suggest
such a condition, but it could still occur.)

Then could you enable profiling (CONFIG_PROFILING=y and profile=1 boot
parameter), and create a script like this to capture a kernel profile
for a fixed amount of time:

#!/bin/bash

readprofile -r # reset profile
sleep 10
readprofile -n -m /home/mingo/System.map | sort -n

and start it manually when the anomaly triggers. Also start it during a
'normal' period of the test. The output should give us a rough idea of
what is happening. This type of profiling is very low-overhead so it
wont disturb the condition.

Note that you can increase the frequency and the quality of profiling by
enabling the NMI watchdog (LOCAL_APIC in the .config and nmi_watchdog=2
boot option), in the -RT kernel it will automatically switch the
profiling tick to occur from NMI context. Such tracing will also show
overhead occuring in irqs-off functions.

a more intrusive method would be to capture a trace of the anomalous
workload. For that you'll need to enable WAKEUP_TIMING+LATENCY_TRACING
(this is the lowest-overhead tracing variant), and run the attached
trace-it utility during the anomaly:

trace-it > anomaly.log

and during the normal period as well:

trace-it > normal.log

note that to capture a significant amount of system activity you might
need to increase MAX_TRACE in kernel/latency.c (e.g. 8-fold should be
enough to capture a second worth of system activity). If you generate
such large traces then please bzip2 them and send them to me privately.
You might also need to increase the usleep() period in trace-it.c, to
capture longer periods of time.

note that the enabling of tracing could also make the anomalous
condition go away, so it might not be the right method.

Ingo

/*
* Copyright (C) 1999, Ingo Molnar <mingo@xxxxxxxxxx>
*/

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>
#include <linux/unistd.h>

int main (int argc, char **argv)
{
int ret;

if (getuid() != 0) {
fprintf(stderr, "needs to run as root.\n");
exit(1);
}
ret = system("cat /proc/sys/kernel/mcount_enabled >/dev/null 2>/dev/null");
if (ret) {
fprintf(stderr, "CONFIG_LATENCY_TRACING not enabled?\n");
exit(1);
}
system("echo 1 > /proc/sys/kernel/trace_all_cpus");
system("echo 1 > /proc/sys/kernel/trace_enabled");
system("echo 0 > /proc/sys/kernel/trace_freerunning");
system("echo 0 > /proc/sys/kernel/trace_print_at_crash");
system("echo 1 > /proc/sys/kernel/trace_user_triggered");
system("echo 0 > /proc/sys/kernel/trace_verbose");
system("echo 0 > /proc/sys/kernel/preempt_max_latency");
system("echo 0 > /proc/sys/kernel/preempt_thresh");
system("[ -e /proc/sys/kernel/wakeup_timing ] && echo 0 > /proc/sys/kernel/wakeup_timing");
system("echo 1 > /proc/sys/kernel/mcount_enabled");

gettimeofday(0, 1);
usleep(100000);
gettimeofday(0, 0);

system("cat /proc/latency_trace");

return 0;
}


Attachment: trace-it
Description: Binary data