Re: [RFC] cpuidle : Add debugfs support for cpuidle core

From: Oliver O'Halloran
Date: Wed Dec 18 2019 - 09:26:26 EST


On Wed, Dec 18, 2019 at 3:51 AM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Tue, Dec 17, 2019 at 3:42 PM Abhishek Goel
> <huntbag@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > Up until now, we did not have a way to tune cpuidle attribute like
> > residency in kernel. This patch adds support for debugfs in cpuidle core.
> > Thereby providing support for tuning cpuidle attributes like residency in
> > kernel at runtime.
>
> This is not a good idea in my view, for a couple of reasons.
>
> First off, if the target residency of an idle state is changed, it
> effectively becomes a different one and all of the statistics
> regarding it become outdated at that point. Synchronizing that would
> be a pain.
>
> Next, governors may get confused if idle state parameters are changed
> on the fly. In particular, the statistics collected by the teo
> governor depend on the target residencies of idle states, so if one of
> them changes, the governor needs to be reloaded.
>
> Next, idle states are expected to be ordered by the target residency
> (and by the exit latency), so their parameters cannot be allowed to
> change freely anyway.
>
> Finally, the idle state parameters are expected to reflect the
> properties of the hardware, which wouldn't hold any more if they were
> allowed to change at any time.

Certainly does sound like a headache.

> > For example: Tuning residency at runtime can be used to quantify governors
> > decision making as governor uses residency as one of the parameter to
> > take decision about the state that needs to be entered while idling.
>
> IMO it would be better to introduce a testing cpuidle driver with an
> artificial set of idle states (or even such that the set of idle
> states to be used by it can be defined by the user e.g. via module
> parameters) for this purpose.

The motivation for this patch isn't really a desire to test / tune the
governor. It's intended to allow working around a performance problem
caused by using high-latency idle states on some interrupt heavy GPU
workload. The interrupts occur around ~30ms apart which is long enough
for the governor to put the CPU into the deeper states and over the
course of long job the additional wakeup latency adds up. The initial
fix someone came up with was cooking the residency values so the
high-latency states had a residency of +50ms to prevent the govenor
from using them. However, that fix is supposed to go into a bit of
firmware I maintain and I'm not terribly happy with the idea. I'm
fairly sure that ~30ms value is workload dependent and personally I
don't think firmware should be making up numbers to trick specific
kernel versions into doing specific things.

My impression is the right solution is to have the GPU driver set a PM
QoS constraint on the CPUs receiving interrupts while a job is
on-going. However, interrupt latency sensitivity isn't something
that's unique to GPUs so I'm wondering it it makes sense to have the
governor factor in interrupt traffic when deciding what state to use.
Is that something that's been tried before?

Oliver