Re: [RFC v2 3/7] Improve the tracking of active utilisation

From: Peter Zijlstra
Date: Tue Apr 05 2016 - 14:11:36 EST


On Tue, Apr 05, 2016 at 07:56:57PM +0200, luca abeni wrote:
> On Tue, 5 Apr 2016 17:00:36 +0200
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > On Fri, Apr 01, 2016 at 05:12:29PM +0200, Luca Abeni wrote:
> > > +static void task_go_inactive(struct task_struct *p)
> > > +{
> > > + struct sched_dl_entity *dl_se = &p->dl;
> > > + struct hrtimer *timer = &dl_se->inactive_timer;
> > > + struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> > > + struct rq *rq = rq_of_dl_rq(dl_rq);
> > > + ktime_t now, act;
> > > + s64 delta;
> > > + u64 zerolag_time;
> > > +
> > > + WARN_ON(dl_se->dl_runtime == 0);
> > > +
> > > + /* If the inactive timer is already armed, return immediately */
> > > + if (hrtimer_active(&dl_se->inactive_timer))
> > > + return;
> >
> > So while we start the timer on the local cpu, we don't migrate the timer
> > when we migrate the task, so the callback can happen on a remote cpu,
> > right?
> >
> > Therefore, the timer function might still be running, but just have done
> > task_rq_unlock(), which would have allowed our cpu to acquire the
> > rq->lock and get here.
> >
> > Then the above check is true, we'll quit, but effectively the inactive
> > timer will not run 'again'.
> Uhm... So the problem is:
> - Task T wakes up, but cannot cancel its inactive timer, because it is running
> + This should not be a problem: inactive_task_timer() will return without
> doing anything
> - Before inactive_task_timer() can actually run, task T migrates to a different CPU
> - Befere the timer finishes to run, the task blocks again... So, task_go_inactive()
> sees the timer as active and returns immediately. But the timer has already
> executed (without doing anything). So noone decreases the rq utilisation.
>
> I did not think about this issue, and I never managed to trigger it in my
> tests... I'll try to see how it can be addressed. Do you have any suggestions?

So my brain is about to give out, but it might be easiest to simply
track if the current tasks' bandwidth is added with a per task variable
under pi and rq lock.