Re: [PATCH 1/2] sched/fair: move cpufreq hook to update_cfs_rq_load_avg()

From: Rafael J. Wysocki
Date: Wed Apr 13 2016 - 10:46:10 EST


On Tue, Apr 12, 2016 at 9:38 PM, Steve Muckle <steve.muckle@xxxxxxxxxx> wrote:
> On Tue, Apr 12, 2016 at 04:29:06PM +0200, Rafael J. Wysocki wrote:
>> On Mon, Apr 11, 2016 at 11:20 PM, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>> > On Mon, Apr 11, 2016 at 9:28 PM, Steve Muckle <steve.muckle@xxxxxxxxxx> wrote:
>> >> Hi Rafael,
>> >>
>> >> On 04/01/2016 02:20 AM, Peter Zijlstra wrote:
>> >>>> > My thinking was in CFS we get rid of the (cpu == smp_processor_id())
>> >>>> > condition for calling the cpufreq hook.
>> >>>> >
>> >>>> > The sched governor can then calculate utilization and frequency required
>> >>>> > for cpu. If (cpu == smp_processor_id()), the update is processed
>> >>>> > normally. If (cpu != smp_processor_id()) and the new frequency is higher
>> >>>> > than cpu's Fcur, the sched gov IPIs cpu to continue running the update
>> >>>> > operation. Otherwise, the update is dropped.
>> >>>> >
>> >>>> > Does that sound plausible?
>> >>>
>> >>> Can be done I suppose..
>> >>
>> >> Currently we drop schedutil updates for a target CPU which do not occur
>> >> on that CPU.
>> >>
>> >> Is this solely due to platforms which must run the cpufreq driver on the
>> >> target CPU?
>> >
>> > The current code assumes that the CPU running the update will always
>> > be the one that gets updated. Anything else would require extra
>> > synchronization.
>>
>> This is rather fundamental.
>>
>> For example, if you look at cpufreq_update_util(), it does this:
>>
>> data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
>>
>> meaning that it will run the current CPU's utilization update
>> callback. Of course, that won't work cross-CPU, because in principle
>> different CPUs may use different governors and therefore different
>> util update callbacks.
>>
>> If you want to do remote updates, I guess that will require an
>> irq_work to run the update on the target CPU, but then you'll probably
>> want to neglect the rate limit on it as well, so it looks like a
>> "need_update" flag in struct update_util_data will be useful for that.
>>
>> I think I can prototype something along these lines, but can you
>> please tell me more about the case you have in mind?
>
> I'm concerned generally with the latency to react to changes in
> required capacity due to remote wakeups, which are quite common on SMP
> platforms with shared cache. Unless the hook is called it could take
> up to a tick to react AFAICS if the target CPU is running some other
> task that does not get preempted by the wakeup.

So the scenario seems to be that CPU A is running task X and CPU B
wakes up task Y on it remotely, but that task has to wait for CPU A to
get to it, so you want to increase the frequency of CPU A at the
wakeup time so as to reduce the time the woken up task has to wait.

In that case task X would not be giving the CPU away (ie. no
invocations of schedule()) for the whole tick, so it would be
CPU/memory bound. In that case I would expect CPU A to be running at
full capacity already unless this is the first tick period in which
task X behaves this way which looks like a corner case to me.

Moreover, sending an IPI to CPU A in that case looks like the right
thing to do to me anyway.

Thanks,
Rafael