Re: [PATCH 1/2] sched/deadline: add per rq tracking of admitted bandwidth

From: Juri Lelli
Date: Fri Feb 12 2016 - 12:19:07 EST


On 12/02/16 18:05, Peter Zijlstra wrote:
> On Thu, Feb 11, 2016 at 05:10:12PM +0000, Juri Lelli wrote:
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 6368f43..1eccecf 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
>
> > +static void swap_task_ac_bw(struct task_struct *p,
> > + struct rq *from,
> > + struct rq *to)
> > +{
> > + unsigned long flags;
> > +
> > + lockdep_assert_held(&p->pi_lock);
> > + local_irq_save(flags);
> > + double_rq_lock(from, to);
> > + __dl_sub_ac(from, p->dl.dl_bw);
> > + __dl_add_ac(to, p->dl.dl_bw);
> > + double_rq_unlock(from, to);
> > + local_irq_restore(flags);
> > +}
>
> > +static void migrate_task_rq_dl(struct task_struct *p)
> > +{
> > + if (p->fallback_cpu != -1)
> > + swap_task_ac_bw(p, task_rq(p), cpu_rq(p->fallback_cpu));
> > +}
>
> This patch scares me.
>

Yeah, same here. However, I didn't find yet something different to fix
this and wanted some help :).

> Now, my brain is having an awfully hard time trying to re-engage after
> flu, but this looks very wrong.
>
> So we call sched_class::migrate_task_rq() from set_task_cpu(), and we
> call set_task_cpu() while potentially holding rq::lock's (try
> push_dl_task() for kicks).
>
> Sure, you play horrible games with fallback_cpu, but those games are
> just that, horrible.
>

Right. I'm counting on fallback_cpu to be able to not call
swap_task_ac_bw() (and the rq locks) during push/pull migrations.
I was actually thinking that we could have a non locked version of swap
and call that in push/pull from migrate_task_rq_dl. But this is most
probably more horrible.

>
> So your initial patch migrates the bandwidth along when a runnable task
> gets moved about, this hack seems to be mostly about waking up. The
> 'normal' accounting is done on enqueue/dequeue, while here you use the
> migration hook.
>

The problem is that I don't do anything in enqueue/dequeue (apart from
when cpuset migrates us while still on_rq), and I think we don't want to
do anything there as a task dl_bw should remain in ac_bw when it goes to
sleep, etc. This is the static view of admitted bw. We want to
save/restore the admitted bw in the root_domain also when tasks are
sleeping/blocked.

> Having two separate means of accounting this also feels more fragile
> than one would want.
>
> Let me think a bit about this.
>

I was looking at sending out a v2 with this as RFC. I guess is better if
I wait :).

Thanks!

Best,

- Juri