Re: [RFC PATCH v3 03/10] PM: Introduce an Energy Model management framework

From: Quentin Perret
Date: Wed Jun 06 2018 - 10:37:52 EST


Hi Dietmar,

On Wednesday 06 Jun 2018 at 15:12:15 (+0200), Dietmar Eggemann wrote:
> > +static void fd_update_cs_table(struct em_cs_table *cs_table, int cpu)
> > +{
> > + unsigned long cmax = arch_scale_cpu_capacity(NULL, cpu);
> > + int max_cap_state = cs_table->nr_cap_states - 1;
> > + unsigned long fmax = cs_table->state[max_cap_state].frequency;
> > + int i;
> > +
> > + for (i = 0; i < cs_table->nr_cap_states; i++)
> > + cs_table->state[i].capacity = cmax *
> > + cs_table->state[i].frequency / fmax;
> > +}
>
> This has issues on a 32bit system. cs_table->state[i].capacity (unsigned
> long) overflows with the frequency values stored in Hz.

Ah, thank you very much for pointing this out ! I haven't tried on a 32bit
machine yet, my bad. I'll fix that for v4.

>
> Maybe something like this to cure it:
>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 6ad53f1cf7e6..c13b3eb8bf35 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -144,9 +144,11 @@ static void fd_update_cs_table(struct em_cs_table *cs_table, int cpu)
> unsigned long fmax = cs_table->state[max_cap_state].frequency;
> int i;
>
> - for (i = 0; i < cs_table->nr_cap_states; i++)
> - cs_table->state[i].capacity = cmax *
> - cs_table->state[i].frequency / fmax;
> + for (i = 0; i < cs_table->nr_cap_states; i++) {
> + u64 val = (u64)cmax * cs_table->state[i].frequency;
> + do_div(val, fmax);
> + cs_table->state[i].capacity = (unsigned long)val;
> + }
> }

Hmmm yes, that should work.

>
> This brings me to another question. Let's say there are multiple users of
> the Energy Model in the system. Shouldn't the units of frequency and power
> not standardized, maybe Mhz and mW?
> The task scheduler doesn't care since it is only interested in power diffs
> but other user might do.

So the good thing about specifying units is that we can probably assume
ranges on the values. If the power is in mW, assuming that we're talking
about a single CPU, it'll probably fit in 16 bits. 65W/core should be
a reasonable upper-bound ?
But there are also vendors who might not be happy with disclosing absolute
values ... These are sometimes considered sensitive and only relative
numbers are discussed publicly. Now, you can also argue that we already
have units specified in IPA for ex, and that it doesn't really matter if
a driver "lies" about the real value, as long as the ratios are correct.
And I guess that anyone can do measurement on the hardware and get those
values anyway. So specifying a unit (mW) for the power is probably a
good idea.

For the frequency, I would rather keep it consistent with what CPUFreq
manages. But that should be specified somewhere, I agree.

What do you think ?

Thanks !
Quentin