Re: [PATCH] timers/migration: Return early on deactivation

From: Frederic Weisbecker
Date: Thu Apr 04 2024 - 18:19:22 EST


Le Thu, Apr 04, 2024 at 06:50:26PM +0200, Anna-Maria Behnsen a écrit :
> Commit 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on
> deactivation") removed the logic to return early in tmigr_update_events()
> on deactivation. With this the problem with a not properly updated first
> global event in a hierarchy containing only a single group was fixed.
>
> But when having a look at this code path with a hierarchy with more than a
> single level, now unnecessary work is done (example is partially copied
> from the message of the commit mentioned above):
>
> [GRP1:0]
> migrator = GRP0:0
> active = GRP0:0
> nextevt = T0:0i, T0:1
> / \
> [GRP0:0] [GRP0:1]
> migrator = 0 migrator = NONE
> active = 0 active = NONE
> nextevt = T0i, T1 nextevt = T2
> / \ / \
> 0 (T0i) 1 (T1) 2 (T2) 3
> active idle idle idle
>
> 0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
> upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
> CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2
>
> [GRP1:0]
> migrator = GRP0:0
> active = GRP0:0
> nextevt = T0:0i, T0:1
> / \
> [GRP0:0] [GRP0:1]
> migrator = NONE migrator = NONE
> active = NONE active = NONE
> nextevt = T1 nextevt = T2
> / \ / \
> 0 (T0i) 1 (T1) 2 (T2) 3
> idle idle idle idle
>
> 1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
> pushed as its next expiry and its own event kept as "ignore". Without this
> early return the following steps happen in tmigr_update_events() when
> child = null and group = GRP0:0 :
>
> lock(GRP0:0->lock);
> timerqueue_del(GRP0:0, T0i);
> unlock(GRP0:0->lock);
>
>
> [GRP1:0]
> migrator = NONE
> active = NONE
> nextevt = T0:0, T0:1
> / \
> [GRP0:0] [GRP0:1]
> migrator = NONE migrator = NONE
> active = NONE active = NONE
> nextevt = T1 nextevt = T2
> / \ / \
> 0 (T0i) 1 (T1) 2 (T2) 3
> idle idle idle idle
>
> 2) The change now propagates up to the top. Then tmigr_update_events()
> updates the group event of GRP0:0 and executes the following steps
> (child = GRP0:0 and group = GRP0:0):
>
> lock(GRP0:0->lock);
> lock(GRP1:0->lock);
> evt = tmigr_next_groupevt(GRP0:0); -> this removes the ignored events
> in GRP0:0
> ... update GRP1:0 group event and timerqueue ...
> unlock(GRP1:0->lock);
> unlock(GRP0:0->lock);
>
> So the dance in 1) with locking the GRP0:0->lock and removing the T0i from
> the timerqueue is redundand as this is done nevertheless in 2) when
> tmigr_next_groupevt(GRP0:0) is executed.
>
> Revert commit 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on
> deactivation") and add a condition into return path to skip the return
> only, when hierarchy contains a single group.
>
> Fixes: 4b6f4c5a67c0 ("timer/migration: Remove buggy early return on deactivation")
> Signed-off-by: Anna-Maria Behnsen <anna-maria@xxxxxxxxxxxxx>

Reviewed-by: Frederic Weisbecker <frederic@xxxxxxxxxx>

Just some comment nits:

> ---
> kernel/time/timer_migration.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> --- a/kernel/time/timer_migration.c
> +++ b/kernel/time/timer_migration.c
> @@ -751,6 +751,31 @@ bool tmigr_update_events(struct tmigr_gr
>
> first_childevt = evt = data->evt;
>
> + /*
> + * Walking the hierarchy is required in any case when a
> + * remote expiry was done before. This ensures to not lose
> + * already queued events in non active groups (see section
> + * "Required event and timerqueue update after a remote
> + * expiry" in the documentation at the top).
> + *
> + * The two call sites which are executed without a remote expiry
> + * before, are not prevented from propagating changes through
> + * the hierarchy by the return:
> + * - When entering this path by tmigr_new_timer(), @evt->ignore
> + * is never set.
> + * - tmigr_inactive_up() takes care of the propagation by
> + * itself and ignores the return value. But an immediate
> + * return is required because nothing has to be done in this
> + * level as the event could be ignored.

It's not exactly required, it's an optimization. How about:

"""
But an immediate return is possible if there is a parent, sparing group
locking at this level, because the upper walking call to the parent will
take care about removing this event from within the group and update
next_expiry accordingly.
"""

> + *
> + * But, if the hierarchy has only a single level so @group is
> + * the top level group, make sure first event information of the
> + * group is updated properly and also handled properly, so skip
> + * this fast return path.

"""
However if there is no parent, ie: the hierarchy has only a single level so
@group is the top level group, make sure the first event information of the
group is updated properly and also handled properly, so skip
this fast return path.
"""

Thanks.

> + */
> + if (evt->ignore && !remote && group->parent)
> + return true;
> +
> raw_spin_lock(&group->lock);
>
> childstate.state = 0;