Re: [RFC] cpuidle : Add debugfs support for cpuidle core

From: Abhishek
Date: Fri Dec 20 2019 - 07:51:43 EST


Hi Rafael,


On 12/19/2019 02:34 PM, Rafael J. Wysocki wrote:
On Wed, Dec 18, 2019 at 3:26 PM Oliver O'Halloran <oohall@xxxxxxxxx> wrote:
On Wed, Dec 18, 2019 at 3:51 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
On Tue, Dec 17, 2019 at 3:42 PM Abhishek Goel
<huntbag@xxxxxxxxxxxxxxxxxx> wrote:
Up until now, we did not have a way to tune cpuidle attribute like
residency in kernel. This patch adds support for debugfs in cpuidle core.
Thereby providing support for tuning cpuidle attributes like residency in
kernel at runtime.
This is not a good idea in my view, for a couple of reasons.

First off, if the target residency of an idle state is changed, it
effectively becomes a different one and all of the statistics
regarding it become outdated at that point. Synchronizing that would
be a pain.

Next, governors may get confused if idle state parameters are changed
on the fly. In particular, the statistics collected by the teo
governor depend on the target residencies of idle states, so if one of
them changes, the governor needs to be reloaded.

Next, idle states are expected to be ordered by the target residency
(and by the exit latency), so their parameters cannot be allowed to
change freely anyway.

Finally, the idle state parameters are expected to reflect the
properties of the hardware, which wouldn't hold any more if they were
allowed to change at any time.
Certainly does sound like a headache.

For example: Tuning residency at runtime can be used to quantify governors
decision making as governor uses residency as one of the parameter to
take decision about the state that needs to be entered while idling.
IMO it would be better to introduce a testing cpuidle driver with an
artificial set of idle states (or even such that the set of idle
states to be used by it can be defined by the user e.g. via module
parameters) for this purpose.
The motivation for this patch isn't really a desire to test / tune the
governor. It's intended to allow working around a performance problem
caused by using high-latency idle states on some interrupt heavy GPU
workload. The interrupts occur around ~30ms apart which is long enough
for the governor to put the CPU into the deeper states and over the
course of long job the additional wakeup latency adds up. The initial
fix someone came up with was cooking the residency values so the
high-latency states had a residency of +50ms to prevent the govenor
from using them. However, that fix is supposed to go into a bit of
firmware I maintain and I'm not terribly happy with the idea. I'm
fairly sure that ~30ms value is workload dependent and personally I
don't think firmware should be making up numbers to trick specific
kernel versions into doing specific things.

My impression is the right solution is to have the GPU driver set a PM
QoS constraint on the CPUs receiving interrupts while a job is
on-going.
Yes, that would address the GPU problem.

However, interrupt latency sensitivity isn't something
that's unique to GPUs so I'm wondering it it makes sense to have the
governor factor in interrupt traffic when deciding what state to use.
Is that something that's been tried before?
Yes, that is in the works.

The existing governors should take interrupts into account too in the
form of the expected idle duration corrections, but that may not be
particularly precise. If the governor currently in use (I guess menu)
doesn't to that, you may try an alternative one (e.g. teo).

For this particular case, I tried TEO but it did not solve the issue.
That said, work is in progress on taking the actual interrupt
frequency into account in idle duration prediction.

Thanks!

Could you please point out to the patch looking into device interrupt
frequency for cpuidle.

Thanks.