Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant

From: Juri Lelli
Date: Wed Jul 19 2017 - 07:16:42 EST


On 19/07/17 13:00, Peter Zijlstra wrote:
> On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote:
> > On 19/07/17 09:21, Peter Zijlstra wrote:
> > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:
> > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
> > > > if (unlikely(dl_entity_is_special(dl_se)))
> > > > return;
> > > >
> > > > - if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> > > > - delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> > > > - dl_se->runtime -= delta_exec;
> > > > + /*
> > > > + * For tasks that participate in GRUB, we implement GRUB-PA: the
> > > > + * spare reclaimed bandwidth is used to clock down frequency.
> > > > + *
> > > > + * For the others, we still need to scale reservation parameters
> > > > + * according to current frequency and CPU maximum capacity.
> > > > + */
> > > > + if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> > > > + scaled_delta_exec = grub_reclaim(delta_exec,
> > > > + rq,
> > > > + &curr->dl);
> > > > + } else {
> > > > + unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> > > > + unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > > > +
> > > > + scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> > > > + scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> > > > + }
> > > > +
> > > > + dl_se->runtime -= scaled_delta_exec;
> > > >
> > >
> > > This I don't get...
> >
> >
> > Considering that we use GRUB's active utilization to drive clock
> > frequency selection, rationale is that GRUB tasks don't need any special
> > scaling, as their delta_exec is already scaled according to GRUB rules.
> > OTOH, normal tasks need to have their runtime (delta_exec) explicitly
> > scaled considering current frequency (and CPU max capacity), otherwise
> > they are going to receive less runtime than granted at AC, when
> > frequency is reduced.
>
> I don't think that quite works out. Given that the frequency selection
> will never quite end up at exactly the same fraction (if the hardware
> listens to your requests at all).
>

It's an approximation yes (how big it depends on the granularity of the
available frequencies). But, for the !GRUB tasks it should be OK, as we
always select a frequency (among the available ones) bigger than the
current active utilization.

Also, for platforms/archs that don't redefine arch_scale_* this is not
used. In case they are defined instead the assumption is that either hw
listens to requests or scaling factors can be derived in some other ways
(avgs?).

> Also, by not scaling the GRUB stuff, don't you run the risk of
> attempting to hand out more idle time than there actually is?

The way I understand it is that for GRUB tasks we always scale
considering the "correct" factor. Then frequency could be higher, but
this spare idle time will be reclaimed by other GRUB tasks.