Re: [patch 2/7] timer: Remove FIFO guarantee

From: Thomas Gleixner
Date: Mon Jun 08 2015 - 13:48:46 EST


On Sat, 5 Jun 2015, George Spelvin wrote:

> Two thoughts:
>
> 1) It's not clear that timer slack has violated the FIFO guarantee.
>
> Remember, the guarantee only applies when two timers have the same
> (absolute) timeout; i.e. are requested for the same time.
>
> For two entries with the same timeout, applying slack will produce
> the same adjusted timeout. That's the whole point of slack; to
> force the timeouts to collide more.
>
> Because the unadjusted timeouts aren't stored, timeout ordering
> can be violated; i.e. timers for time t and t+1, which round to the
> same time ater slack is applied, will expire in FIFO order rather
> than timeout order. This is non-
>
> But in the case where he FIFO guatantee used to exist, I don't think
> timer slack broke it.

It does. Depending on when you enqueue the timer because the thing is
calculated from the delta (expires - jiffies).

Now thinking more about it. Even before we introduced the slack, the
FIFO guarantee was only there, if two timers were enqueued into the
same array level. Assume the following:

T1 expires = 257 enqueue time = 0
T2 expires = 257 enqueue time = 200

So T1 will go into the secondary array and will be recascaded
into the primary array after 256 jiffies.

T2 will go into the primary array BEFORE T1 is recascaded. So T1 will
be queued into the same bucket at cascading AFTER T2.

Another issue is the whole NOHZ target cpu thing. Even if two timers
are enqueued at the same jiffie with the same expiry value, they can
end up on two different cpus. Due to tick skew, which can be explicit,
T2 can expire before T1.

> I'm not disagreeing with the change, but it's not clear to me that
> it's as safe as you think.

After thinking more about it, I'm even more sure that any code which
relies on the FIFO "guarantee" is broken today.

> 2) If you want to do some more in that area, one thing I've been meaning
> to get around to is eliminating the whole round_jiffies() system.
>
> It does basically the same thing as the slack system, although with
> less flexibility, and it would be wonderful to rip it out of
> the kernel completely.

Once we have natural batching in the wheel itself. I think we can kill
that completely.

> Additional rambling you should ignore. It exists because I haven't
> figured out why it's impractical yet.
>
> An interesting variant on the slack system would be to apply slack in the
> wakeup loop rather than before sleeping. It might permit more bundling
> and thus fewer wakeups.
>
> You have the timers in original order. But computing the next expiration
> is more complex. Each timer has a minimum and maximum wakeup time, and
> they're sorted by minimum time. You scan the list of pending timers,
> accumulating a "batch" which can all be woken up together.

Similar to what we do in hrtimers. Though the main issue of the timer
wheel is, that we have no fast way to find the next expiring
timer. Maybe we can revisit this once I got the rework of the wheel
logic completed.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/