>On Wed, Oct 20, 1999 at 11:38:25PM +0200, Andrea Arcangeli wrote:
>> On Wed, 20 Oct 1999, David Hinds wrote:
>>
>> > do {
>> > current->state = TASK_INTERRUPTIBLE;
>> > timeout = schedule_timeout(timeout);
>> > } while (timeout);
>> > current->state = TASK_RUNNING;
>>
>> Yes, that will work fine and it's the right thing to do, except the fact
>> you don't care about signals in the above fragment and so you should use
>> TASK_UNINTERRUPTIBLE instead (it will avoid some not necessary
>> schedule()).
>
>Ok... then this raises a few questions:
>
>- I'm not clear on the semantics of current->state. In various places
> in the kernel, it is set to TASK_INTERRUPTIBLE, then may or may not
> be set back to TASK_RUNNING. Is not setting it back to TASK_RUNNING
NOTENOTE: when you return from schedule_timeout you _always_ are in
TASK_RUNNING state.
So your example was perfectly right but I forgot to tell you that the
'current->state = TASK_RUNNING' was not necessary. It won't hurt of course
but it's not necessary.
So the right way to unconditionally wait for 5 sec _if_ you are not
registered in any waitqueue and so if you can't be wakenup from a kernel
event (that is not a signal) is:
__set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(5*HZ);
So if an init_module() looks like this:
init_module()
{
__set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(5*HZ);
...
<code>
}
then you are sure that the init_module will sure wait 5 sec before
executing <code> because the process running in the early code in
init_module is can't be registered in any waitqueue (this because the
module syscall is not registering itself before calling the init_module
callback).
If you are registered for getting wakeups from some kernel event and and
you want to block unconditionally for 5 sec then you have instead to do
this:
unsigned long timeout = 5*HZ;
do {
__set_current_state(TASK_UNINTERRUPTIBLE);
timeout = schedule_timeout(timeout);
} while (timeout);
Also note that in 2.3.x you should use __set_current_state(TASK_*) (the
assembler it's the same but the code looks cleaner ;).
> a bug? What are the consequences? Should it be set inside the loop
> as above, or should I set it once, before entering the loop?
Exactly because when you return from schedule_timeout() you are always a
running task, you must always set the state to TASK_[UN]INTERRUPTIBLE
before running schedule_timeout again for a second time.
>- Something like 95% of callers of schedule_timeout() ignore the
> return value. Many seem to expect it to not return early: it is
> used for timed delays in many drivers. Are these all bugs?
They are not buggy if they are using TASK_UNINTERRUPTIBLE and they can't
be wakenup by _hand_ from a wake_up_process() somewhere.
They are not buggy if they are using TASK_INTERRUPTIBLE and they are
expecting to return early in case of a signal and they can't be wakenup by
some lowlevel kernel code.
>- Are signals the only reason for early-exit? If so, then if I use
No. It depends on the code you are running. If you registered yourself in
a waitqueue, then you can get a wakeup from an kernel event as well
(usually this is used to get wakeups from irqs).
> TASK_UNINTERRUPTIBLE, does that guarantee that the return value is
> always zero, so the loop is superfluous? Why do virtually all
No. TASK_UNINTERRUPTIBLE only forbids signals to wakeup the task.
For example: when you wait for I/O completation you first register
yourself in the page->wait waitqueue, and then you go to sleep in
UNINTERRUPTIBLE state. So only the wakeup at IO completation time will be
able to wakeup you (so only when the I/O is completated you'll get
rescheduled). If you'll press C^c the task won't be interrupted (and
infact you'll have to wait I/O completation before being able to interrupt
the task).
> drivers use TASK_INTERRUPTIBLE, when it seems that UNINTERRUPTIBLE
> has the more appropriate semantics?
Could you make some example? If they are using TASK_INTERRUPTIBLE probably
they are checking for a signal to giveup and exit early.
Also note that kswapd and friends are using INTERRUPTIBLE for a very
subtle reason. Give a try to use UNINTERRUPTIBLE and you'll find your CPU
load fixed to 3 (kswapd+kupdate+kflushd = 3) ;). For such daemons using
interruptible or not make no difference as it will only generate an
early run that only root can trigger (and interruptible avoids the loadavg
screwup).
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/