Re: [PATCH v10 18/20] timers: Implement the hierarchical pull model

From: Frederic Weisbecker
Date: Sun Feb 04 2024 - 17:32:19 EST


Le Mon, Jan 15, 2024 at 03:37:41PM +0100, Anna-Maria Behnsen a écrit :
> +/*
> + * Returns true, if there is nothing to be propagated to the next level
> + *
> + * @data->firstexp is set to expiry of first gobal event of the (top level of
> + * the) hierarchy, but only when hierarchy is completely idle.
> + *
> + * This is the only place where the group event expiry value is set.
> + */
> +static
> +bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
> + struct tmigr_walk *data, union tmigr_state childstate,
> + union tmigr_state groupstate)
> +{
> + struct tmigr_event *evt, *first_childevt;
> + bool walk_done, remote = data->remote;
> + bool leftmost_change = false;
> + u64 nextexp;
> +
> + if (child) {
> + raw_spin_lock(&child->lock);
> + raw_spin_lock_nested(&group->lock, SINGLE_DEPTH_NESTING);
> +
> + if (childstate.active) {
> + walk_done = true;
> + goto unlock;
> + }
> +
> + first_childevt = tmigr_next_groupevt(child);
> + nextexp = child->next_expiry;
> + evt = &child->groupevt;
> + } else {
> + nextexp = data->nextexp;
> +
> + first_childevt = evt = data->evt;
> +
> + /*
> + * Walking the hierarchy is required in any case when a
> + * remote expiry was done before. This ensures to not lose
> + * already queued events in non active groups (see section
> + * "Required event and timerqueue update after a remote
> + * expiry" in the documentation at the top).
> + *
> + * The two call sites which are executed without a remote expiry
> + * before, are not prevented from propagating changes through
> + * the hierarchy by the return:
> + * - When entering this path by tmigr_new_timer(), @evt->ignore
> + * is never set.
> + * - tmigr_inactive_up() takes care of the propagation by
> + * itself and ignores the return value. But an immediate
> + * return is required because nothing has to be done in this
> + * level as the event could be ignored.
> + */
> + if (evt->ignore && !remote)
> + return true;
> +
> + raw_spin_lock(&group->lock);
> + }
> +
> + if (nextexp == KTIME_MAX) {
> + evt->ignore = true;
> +
> + /*
> + * When the next child event could be ignored (nextexp is
> + * KTIME_MAX) and there was no remote timer handling before or
> + * the group is already active, there is no need to walk the
> + * hierarchy even if there is a parent group.
> + *
> + * The other way round: even if the event could be ignored, but
> + * if a remote timer handling was executed before and the group
> + * is not active, walking the hierarchy is required to not miss
> + * an enqueued timer in the non active group. The enqueued timer
> + * of the group needs to be propagated to a higher level to
> + * ensure it is handled.
> + */
> + if (!remote || groupstate.active) {
> + walk_done = true;
> + goto unlock;

So if the current tmc going inactive was the migrator for the whole hierarchy
and it is reaching here the top-level, this assumes that if none of this tmc's
groups have a timer, then it can just return. But what if the top level has
timers from other children? Who is going to handle them then?

Should this be "goto check_toplvl" instead?

Thanks.

> + }
> + } else {
> + /*
> + * An update of @evt->cpu and @evt->ignore flag is required only
> + * when @child is set (the child is equal or higher than lvl0),
> + * but it doesn't matter if it is written once more to the per
> + * CPU event; make the update unconditional.
> + */
> + evt->cpu = first_childevt->cpu;
> + evt->ignore = false;
> + }
> +
> + walk_done = !group->parent;
> +
> + /*
> + * If the child event is already queued in the group, remove it from the
> + * queue when the expiry time changed only.
> + */
> + if (timerqueue_node_queued(&evt->nextevt)) {
> + if (evt->nextevt.expires == nextexp)
> + goto check_toplvl;
> +
> + leftmost_change = timerqueue_getnext(&group->events) == &evt->nextevt;
> + if (!timerqueue_del(&group->events, &evt->nextevt))
> + WRITE_ONCE(group->next_expiry, KTIME_MAX);
> + }
> +
> + evt->nextevt.expires = nextexp;
> +
> + if (timerqueue_add(&group->events, &evt->nextevt)) {
> + leftmost_change = true;
> + WRITE_ONCE(group->next_expiry, nextexp);
> + }
> +
> +check_toplvl:
> + if (walk_done && (groupstate.migrator == TMIGR_NONE)) {
> + /*
> + * Nothing to do when first event didn't changed and update was
> + * done during remote timer handling.
> + */
> + if (remote && !leftmost_change)
> + goto unlock;
> + /*
> + * The top level group is idle and it has to be ensured the
> + * global timers are handled in time. (This could be optimized
> + * by keeping track of the last global scheduled event and only
> + * arming it on the CPU if the new event is earlier. Not sure if
> + * its worth the complexity.)
> + */
> + data->firstexp = tmigr_next_groupevt_expires(group);
> + }
> +
> +unlock:
> + raw_spin_unlock(&group->lock);
> +
> + if (child)
> + raw_spin_unlock(&child->lock);
> +
> + return walk_done;
> +}