Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization

From: luca abeni
Date: Fri Mar 24 2017 - 17:47:41 EST

Next message: Tobias Herzog: "Re: [PATCH v2 1/4] cdc-acm: reassemble fragmented notifications"
Previous message: David Miller: "Re: [PATCH] net_sched: use setup_deferrable_timer"
In reply to: Peter Zijlstra: "Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization"
Next in thread: Steven Rostedt: "Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Peter,

On Fri, 24 Mar 2017 14:20:41 +0100
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
>
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index d67eee8..952cac8 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -445,16 +445,33 @@ struct sched_dl_entity {
> > *
> > * @dl_yielded tells if task gave up the CPU before
> > consuming
> > * all its available runtime during the last job.
> > + *
> > + * @dl_non_contending tells if task is inactive while still
> > + * contributing to the active utilization. In other words,
> > it
> > + * indicates if the inactive timer has been armed and its
> > handler
> > + * has not been executed yet. This flag is useful to avoid
> > race
> > + * conditions between the inactive timer handler and the
> > wakeup
> > + * code.
> > */
> > int dl_throttled;
> > int dl_boosted;
> > int dl_yielded;
> > + int dl_non_contending;
>
> We should maybe look into making those bits :1, not something for this
> patch though;

Yes, grouping all the flags in a single field was my intention too... I
planned to submit a patch to do this after merging the reclaiming
patches... But maybe it is better to do this first :)

> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index cef9adb..86aed82 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq
> > *dl_rq) dl_rq->running_bw = 0;
> > }
> >
> > +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> > +{
> > + if (!task_on_rq_queued(p)) {
> > + struct rq *rq = task_rq(p);
> > +
> > + if (p->dl.dl_non_contending) {
> > + sub_running_bw(p->dl.dl_bw, &rq->dl);
> > + p->dl.dl_non_contending = 0;
> > + /*
> > + * If the timer handler is currently
> > running and the
> > + * timer cannot be cancelled,
> > inactive_task_timer()
> > + * will see that dl_not_contending is not
> > set, and
> > + * will not touch the rq's active
> > utilization,
> > + * so we are still safe.
> > + */
> > + if
> > (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1)
> > + put_task_struct(p);
> > + }
> > + }
> > +}
>
> If we rearrange that slightly we can avoid the double indent:
[...]
Ok; I'll rework the code in this way on Monday.

[...]
> > + if (hrtimer_try_to_cancel(&dl_se->inactive_timer)
> > == 1)
> > + put_task_struct(dl_task_of(dl_se));
> > + dl_se->dl_non_contending = 0;
>
> This had me worried for a little while as being a use-after-free, but
> this put_task_struct() cannot be the last in this case. Might be
> worth a comment.

Or maybe it is better to move "dl_se->dl_non_contending = 0;" before
hrtimer_try_to_cancel()?

>
> > + } else {
> > + /*
> > + * Since "dl_non_contending" is not set, the
> > + * task's utilization has already been removed from
> > + * active utilization (either when the task
> > blocked,
> > + * when the "inactive timer" fired).
> > + * So, add it back.
> > + */
> > + add_running_bw(dl_se->dl_bw, dl_rq);
> > + }
> > +}
>
> In general I feel it would be nice to have a state diagram included
> somewhere near these two functions. It would be nice to not have to
> dig out the PDF every time.

Ok... Since I am not good at ascii art, would it be ok to add a textual
description? If yes, I'll add a comment like:
"
The utilization of a task is added to the runqueue's active utilization
when the task becomes active (is enqueued in the runqueue), and is
removed when the task becomes inactive. A task does not become
immediately inactive when it blocks, but becomes inactive at the so
called "0 lag time"; so, we setup the "inactive timer" to fire at the
"0 lag time". When the "inactive timer" fires, the task utilization is
removed from the runqueue's active utilization. If the task wakes up
again on the same runqueue before the "0 lag time", the active
utilization must not be changed and the "inactive timer" must be
cancelled. If the task wakes up again on a different runqueue before
the "0 lag time", then the task's utilization must be removed from the
previous runqueue's active utilization and must be added to the new
runqueue's active utilization.
In order to avoid races between a task waking up on a runqueue while the
"inactive timer" is running on a different CPU, the "dl_non_contending"
flag is used to indicate that a task is not on a runqueue but is active
(so, the flag is set when the task blocks and is cleared when the
"inactive timer" fires or when the task wakes up).
"
(if this is ok, where can I add this comment?)

> > +static void migrate_task_rq_dl(struct task_struct *p)
> > +{
> > + if ((p->state == TASK_WAKING) &&
> > (p->dl.dl_non_contending)) {
> > + struct rq *rq = task_rq(p);
> > +
> > + raw_spin_lock(&rq->lock);
> > + sub_running_bw(p->dl.dl_bw, &rq->dl);
> > + p->dl.dl_non_contending = 0;
> > + /*
> > + * If the timer handler is currently running and
> > the
> > + * timer cannot be cancelled, inactive_task_timer()
> > + * will see that dl_not_contending is not set, and
> > + * will not touch the rq's active utilization,
> > + * so we are still safe.
> > + */
> > + if (hrtimer_try_to_cancel(&p->dl.inactive_timer)
> > == 1)
> > + put_task_struct(p);
> > +
> > + raw_spin_unlock(&rq->lock);
> > + }
> > +}
>
> This can similarly be reduced in indent, albeit this is only a single
> indent level, so not as onerous as the other one.

Ok; I'll do this on Monday.

> What had me puzzled for a while here is taking the lock; because some
> callers of set_task_cpu() must in fact hold this lock already. Worth a
> comment I feel.

Ok; I'll add a comment mentioning that since state is TASK_WAKING we do
not have the lock.

> Once I figured out the exact locking; I realized you do this on
> cross-cpu wakeups. We spend a fair amount of effort not to take both
> rq locks. But I suppose you have to here.

The problem is that when a task wakes up before the "0 lag time" on a
different runqueue, we must "migrate" its utilization from the old
runqueue to the new one (see comment above). And I need the lock to
avoid races... The only alternative I can think about is to ad a new
lock protecting the active utilization, and to take it instead of the
runqueue lock (I do not know if this is enough, I need to check...).
I'll have a look on Monday.

Thanks,
Luca

Next message: Tobias Herzog: "Re: [PATCH v2 1/4] cdc-acm: reassemble fragmented notifications"
Previous message: David Miller: "Re: [PATCH] net_sched: use setup_deferrable_timer"
In reply to: Peter Zijlstra: "Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization"
Next in thread: Steven Rostedt: "Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]