Re: wait_for_completion_timeout problem ???

From: Eli Cohen
Date: Thu Mar 01 2007 - 10:05:09 EST


On Thu, 2007-03-01 at 15:28 +0100, Dmitry Adamushko wrote:
> Hi,
>
> >
> > I have a problem with using this function. I am referring to
> > drivers/infiniband/hw/mthca/mthca_cmd.c line 394. For convenience I
> > quote from this code:
> >
> > init_completion(&context->done);
> >
> > err = mthca_cmd_post(dev, in_param,
> > out_param ? *out_param : 0,
> > in_modifier, op_modifier,
> > op, context->token, 1);
> > if (err)
> > goto out;
> >
> > if (!wait_for_completion_timeout(&context->done, timeout)) {
> > err = -EBUSY;
> > goto out;
> > }
> >
> > timeout is 10 * HZ. Sometimes this function returns 0 which signifies
> > timeout. However I can see that the interrupt handler called
> > complete(&context->done)
> > around 200 usec after calling wait_for_completion_timout(). When the
> > function returns I can see that context->done.done equals 1 which
> > confirms that complete was indeed called.
>
> The sequence of events can be as follows:
>
> a caller gets blocked in wait_for_completion_timeout() on
> schedule_timeout() which literally means:
>
> i ) will be unblocked (scheduled back) after "timeout" has expired;
>
> ii) will be unblocked by someone calling wake_up_*(&x->wait);
>
> (wait_for_completion_timeout() inserted our caller into "x->wait" wait queue)
>
> in both cases schedule_timeout() will do
>
> ...
> schedule(); <------------------ here we get CPU back
> del_singleshot_timer_sync(&timer);
> timeout = expire - jiffies;
>
> out:
> return timeout < 0 ? 0 : timeout;
>
> "expire" is when (+latency) we were expected to be woken up by a
> timeer -> timeout.
>
> Now the point is that our waiter could have been "waken up" (become
> "ready" from the point of view of the scheduler) earlier but it was
> just "scheduled" (got CPU back) later than "expire" so that's why the
> return value is 0 (timeout < 0 ==> return 0).
>
> IOW, schedule_timeout() indicates whether a process has been scheduled
> back /earlier than timeout/ (so return value >0) or /later/ (0).
>
> It doesn't indicate why the process has been woked up ( i.e. (i) or
> (ii) above ).
>
> In you case it became /runnable/ because of complete() but it got
> scheuled later than /timeout/.
>
> And wait_for_completion_timeout() takes it as a /timeout condition/.
>
> So either all the users of wait_for_completion_timeout() should
> additionally check for x->done after they got scheduled
There is only one process waiting on the completion object.
>
> or
>
> wait_for_completion_timeout() should return something different that
> encodes the fact /event happened/ and not just /event happened _and_ a
> caller has got scheduled back earlier than timeout.
That possibility crossed my mind but I ruled it out since the the
timeout is 10 sec while completion arrived less then 4 msec after
calling wait_for_completion_timeout(). So it appears to me unlikely that
it will the task will be in the run list but will not get CPU time for
nearly 10 seconds. I will give it a try though. I will increase timeout
to 20 sec.

>
>
>
> >
> > Thanks
> > Eli
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/