Hi to all,
as I already mentioned on the list, we had problems in our
applications with kernel 2.2.x that we did not have with 2.0.3x. Here
is a brief description of our application:
Application background:
-----------------------
The application is made up of 15 processes that interact through
shared memories and sockets. Each process computes for a few
milliseconds and then pass its data to the next one (a pipeline
computation scheme). When a process is done with its computations, it
calls pause() to release the CPU, it resumes its computations when its
timer interrupt wakes it up every 10 milliseconds, if the computations
were not completed, we log this as an overrun.
The problem:
------------
With Kernel 2.2.x SMP, the processes often make overruns whereas with
2.0.36 SMP with the same settings, they don't. Although, the amount of
computations done in a given period of time (10 to 20 seconds or so)
seems to be the same (or very close at least). So we suspected that
the problem might come from the timers or the timer signal delivery.
When the most critical processes (5 of them) are given a FIFO priority
and if I use the Real Time Clock (/dev/rtc) at 8192 Hz (or any high
"enough" frequencies), things improve, there are much less overruns
for the very same computation.
Also, one of my co-worker noticed that his driver was not able to
reliably send signals (SIGUSR1, SIGUSR2) to 2 user space processes
running with FIFO priority waiting in pause() for the signal. The
signals were not lost, but delivered with a delay.
So we investigated in sched.c and realized that reschedule_idle only
tries to reschedule the current task on the other CPU only if it has
the scheduling policy SCHED_OTHER. By allowing FIFO priority task to
reach reschedule_idle_slow() and be re-scheduled on the other CPU, our
applications performance greatly improved: with FIFO priority for our
critical process, no overrun are observed (without stimulating the
kernel with RTC) and my co-worker's user space processes catch all the
signals sent by his driver.
Here is the small patch:
------------------------
175,178c175,181
< if (p->policy != SCHED_OTHER || p->counter > current->counter + 3) {
< current->need_resched = 1;
< return;
< }
--- > if ((p->policy != SCHED_FIFO) || (current->policy != SCHED_FIFO)) { > > if (p->policy != SCHED_OTHER || p->counter > current->counter + 3) { > current->need_resched = 1; > return; > } > }
For us, it did greatly improved things. We did not observed side-effects (X11, tcp-ip, sound or other). And it only affects FIFO priority tasks. What do you think about this patch ?
Comments, idea ?
Kind regards and thanks for your help/time, Claude
-- Claude Gamache, CAE Electronique Ltee, 8585 Cote-de-Liesse Saint-Laurent, Quebec, Canada H4T 1G6 Email: cgamache@cae.ca Tel.: (514) 341-2000 x3194, Fax: (514) 734-5612- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/