sys_nanosleep() implementation

Riku Saikkonen (rjs@isil.lloke.dna.fi)
Thu, 2 Jan 1997 17:16:59 +0200


I have a question about the implementation of nanosleep(2) in the kernel.

>From a comment in linux-2.0.27/kernel/sched.c (in sys_nanosleep()):
/*
* Short delay requests up to 2 ms will be handled with
* high precision by a busy wait for all real-time processes.
*/
And the code proceeds to use udelay() if the requested delay is <=2 ms and
current->policy != SCHED_OTHER. Otherwise it calls schedule() to sleep.

Why only use udelay() for real-time processes? For many applications (e.g.
controlling most slower digital devices), it's enough that you wait _at
least_ some number of nanoseconds, so that it doesn't matter if a context
switch causes you to delay more, but you have to delay at least that much.

Currently the process would have to either use nanosleep() and delay for the
10 ms that is the resolution of nanosleep()'s sleeping (for i386), or
effectively reimplement udelay() using, say, the BogoMIPS value from
/proc/cpuinfo. Or to use real-time scheduling, but that is overkill for many
applications that aren't critical about maximum delays (and, as I understand
it, real-time scheduling is dangerous, because a real-time scheduled process
can hang the system).

Is there any reason why this check for current->policy != SCHED_OTHER is
there?

If you want a `patch' for this, remove the check "current->policy !=
SCHED_OTHER" in linux-2.0.27/kernel/sched.c line 1406. :)

Then, a suggestion for a bigger improvement to the implementation of
nanosleep(): I think it would be a good thing if nanosleep() could guarantee
sleeping at least the specified amount (unless interrupted, of course), for
any amount that the calling process specifies. This could be done with first
rounding down the amount to the resolution of schedule() (i.e. 10 ms for
i386, 1 ms for Alpha, from the nanosleep(2) man page I have), sleeping for
that amount, and then busy-waiting for the remainder, if any. This strategy
would remove the rather arbitrary 2 ms limit for busy-waiting vs. sleeping.

Also, would it be a good thing to replace udelay() with nanodelay() (or
ndelay() or whatever), since currently the resolution is limited to at most
1 us by the (long) integer argument to udelay()? I think processors are
starting to be fast enough that they can busy-wait for more precise
intervals than a microsecond...

Comments?

--
-=- Rjs -=- rjs@spider.compart.fi, rjs@lloke.dna.fi