Re: schedule_timeout() semantics/usage?

Andrea Arcangeli (andrea@suse.de)
Thu, 21 Oct 1999 01:13:00 +0200 (CEST)


On Wed, 20 Oct 1999, David Hinds wrote:

>On Wed, Oct 20, 1999 at 11:38:25PM +0200, Andrea Arcangeli wrote:
>> On Wed, 20 Oct 1999, David Hinds wrote:
>>
>> > do {
>> > current->state = TASK_INTERRUPTIBLE;
>> > timeout = schedule_timeout(timeout);
>> > } while (timeout);
>> > current->state = TASK_RUNNING;
>>
>> Yes, that will work fine and it's the right thing to do, except the fact
>> you don't care about signals in the above fragment and so you should use
>> TASK_UNINTERRUPTIBLE instead (it will avoid some not necessary
>> schedule()).
>
>Ok... then this raises a few questions:
>
>- I'm not clear on the semantics of current->state. In various places
> in the kernel, it is set to TASK_INTERRUPTIBLE, then may or may not
> be set back to TASK_RUNNING. Is not setting it back to TASK_RUNNING

NOTENOTE: when you return from schedule_timeout you _always_ are in
TASK_RUNNING state.

So your example was perfectly right but I forgot to tell you that the
'current->state = TASK_RUNNING' was not necessary. It won't hurt of course
but it's not necessary.

So the right way to unconditionally wait for 5 sec _if_ you are not
registered in any waitqueue and so if you can't be wakenup from a kernel
event (that is not a signal) is:

__set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(5*HZ);

So if an init_module() looks like this:

init_module()
{
__set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(5*HZ);
...
<code>
}

then you are sure that the init_module will sure wait 5 sec before
executing <code> because the process running in the early code in
init_module is can't be registered in any waitqueue (this because the
module syscall is not registering itself before calling the init_module
callback).

If you are registered for getting wakeups from some kernel event and and
you want to block unconditionally for 5 sec then you have instead to do
this:

unsigned long timeout = 5*HZ;

do {
__set_current_state(TASK_UNINTERRUPTIBLE);
timeout = schedule_timeout(timeout);
} while (timeout);

Also note that in 2.3.x you should use __set_current_state(TASK_*) (the
assembler it's the same but the code looks cleaner ;).

> a bug? What are the consequences? Should it be set inside the loop
> as above, or should I set it once, before entering the loop?

Exactly because when you return from schedule_timeout() you are always a
running task, you must always set the state to TASK_[UN]INTERRUPTIBLE
before running schedule_timeout again for a second time.

>- Something like 95% of callers of schedule_timeout() ignore the
> return value. Many seem to expect it to not return early: it is
> used for timed delays in many drivers. Are these all bugs?

They are not buggy if they are using TASK_UNINTERRUPTIBLE and they can't
be wakenup by _hand_ from a wake_up_process() somewhere.

They are not buggy if they are using TASK_INTERRUPTIBLE and they are
expecting to return early in case of a signal and they can't be wakenup by
some lowlevel kernel code.

>- Are signals the only reason for early-exit? If so, then if I use

No. It depends on the code you are running. If you registered yourself in
a waitqueue, then you can get a wakeup from an kernel event as well
(usually this is used to get wakeups from irqs).

> TASK_UNINTERRUPTIBLE, does that guarantee that the return value is
> always zero, so the loop is superfluous? Why do virtually all

No. TASK_UNINTERRUPTIBLE only forbids signals to wakeup the task.

For example: when you wait for I/O completation you first register
yourself in the page->wait waitqueue, and then you go to sleep in
UNINTERRUPTIBLE state. So only the wakeup at IO completation time will be
able to wakeup you (so only when the I/O is completated you'll get
rescheduled). If you'll press C^c the task won't be interrupted (and
infact you'll have to wait I/O completation before being able to interrupt
the task).

> drivers use TASK_INTERRUPTIBLE, when it seems that UNINTERRUPTIBLE
> has the more appropriate semantics?

Could you make some example? If they are using TASK_INTERRUPTIBLE probably
they are checking for a signal to giveup and exit early.

Also note that kswapd and friends are using INTERRUPTIBLE for a very
subtle reason. Give a try to use UNINTERRUPTIBLE and you'll find your CPU
load fixed to 3 (kswapd+kupdate+kflushd = 3) ;). For such daemons using
interruptible or not make no difference as it will only generate an
early run that only root can trigger (and interruptible avoids the loadavg
screwup).

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/