Re: [PATCH 1/2] sched/fair: move cpufreq hook to update_cfs_rq_load_avg()

From: Rafael J. Wysocki
Date: Wed Apr 13 2016 - 15:50:34 EST


On Wed, Apr 13, 2016 at 8:06 PM, Steve Muckle <steve.muckle@xxxxxxxxxx> wrote:
> On 04/13/2016 09:07 AM, Rafael J. Wysocki wrote:
>>>>>> If you want to do remote updates, I guess that will require an
>>>>>> irq_work to run the update on the target CPU, but then you'll probably
>>>>>> want to neglect the rate limit on it as well, so it looks like a
>>>>>> "need_update" flag in struct update_util_data will be useful for that.
>
> Have you added rate limiting at the hook level that I missed? I thought
> it was just inside schedutil.

It is in schedutil (and other governors), but if you do a cross-CPU
update, you probably want that rate limit to be ignored in that case.
Now, if the local and target CPUs happen to use different governors
(eg. the local CPU uses ondemand and the target one uses schedutil) or
they just don't belong to the same policy, you need to set the "need
update" flag for the target CPU, so the local one needs access to it.
It is better for that flag to be located in the per-CPU data of the
target CPU for that.

>>>>>
>>>>> Why is it required to run the update on the target CPU?
>>>>
>>>> The fast switching and intel_pstate are the main reason.
>>>>
>>>> They both have to write to registers of the target CPU and the code to
>>>> do that needs to run on that CPU.
>
> Ok thanks, I'll take another look at this.
>
> I was thinking it might be nice to be able to push the decision on
> whether to send the IPI in to the governor/hook client. For example in
> the schedutil case, you don't need to IPI if sugov_should_update_freq()
> = false (outside the slight chance it might be true when it runs on the
> target). Beyond that perhaps for policy reasons it's desired to not send
> the IPI if next_freq <= cur_freq, etc.

Yes, that is an option, but then your governor code gets more
complicated. Since every governor would need that complexity, you'd
end up having it in multiple places. To me, it seems more efficient
to just have it in one place (the code that triggers a cross-CPU
update).

And as I said, the rate limit would need to be overridden in the
cross-CPU update case anyway, because it may just prevent you from
getting what you want otherwise.

>>> And these two seem to be the only interesting cases for you, because
>>> if you need to work for the worker thread to schedule to eventually
>>
>> s/work/wait/ (sorry)
>>
>>> change the CPU frequency for you, that will defeat the whole purpose
>>> here.
>
> I was hoping to submit at some point a patch to change the context for
> slow path frequency changes to RT or DL context, so this would benefit
> that case as well.

But it still would require the worker thread to schedule, although it
might just occur a bit earlier if that's DL/RT.

Thanks,
Rafael