Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()

From: Jens Axboe
Date: Thu May 02 2013 - 08:23:38 EST


On Thu, May 02 2013, Daniel Vetter wrote:
> On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@xxxxxxxxxx> wrote:
> >> Fix this by returning at least 1 if the condition becomes true. This
> >> semantic is in line with what wait_for_condition_timeout() does; see
> >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
> >> failure under heavy load".
> >
> > But now you can't distinguish the timer expiring first, if the thread doing
> > the waiting gets delayed sufficiently long for the event to happen.
>
> That can already happen, e.g.
>
> 1. wakeup happens and condition is true.
> 2. we compute remaining jiffies > 0
> -> preempt
> 3. now wait_for_event_timeout returns.
>
> Only difference is that the delay/preempt happens in between 1. and
> 2., and then suddenly the wake up didn't happen in time (with the
> current return code semantics).
>
> So imo the current behaviour is simply a bug and will miss timely
> wakeups in some cases.
>
> The other way round, namely wait_for_event_timeout taking longer than
> the timeout is expected (and part of the interface for every timeout
> function). So all current callers already need to be able to cope with
> random preemption/delays pushing the total time before the call to
> wait_for_event and checking the return value over the timeout, even
> when condition was signalled in time.
>
> If there's any case which relies on accurate timeout detection that
> simply won't work with wait_for_event (they need an nmi or a hw
> timestamp counter or something similar).

I seriously doubt that anyone is depending on any sort of accuracy on
the return. 1 jiffy is not going to make or break anything - in fact,
jiffies could be incremented nsecs after the initial call. So a
granularity of at least 1 is going to be expected in any case.

The important bit here is that the API should behave as expected. And
the most logical way to code that is to check the return value. I can
easily see people forgetting to re-check the condition, hence you get a
bug. The fact that you and the original reporter already had accidents
with this is a clear sign that the logical way to use the API is not the
correct one.

IMHO, the change definitely makes sense.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/