Re: [PATCH] cpuidle: psd: add power sleep demotion prevention for fast I/O devices

From: Rafael J. Wysocki
Date: Wed Mar 26 2025 - 13:46:28 EST


On Wed, Mar 26, 2025 at 5:26 PM Christian Loehle
<christian.loehle@xxxxxxx> wrote:
>
> On 3/26/25 15:04, King, Colin wrote:
> > Hi,
> >
> >> -----Original Message-----
> >> From: Bart Van Assche <bvanassche@xxxxxxx>
> >> Sent: 23 March 2025 12:36
> >> To: King, Colin <colin.king@xxxxxxxxx>; Christian Loehle
> >> <christian.loehle@xxxxxxx>; Jens Axboe <axboe@xxxxxxxxx>; Rafael J.
> >> Wysocki <rafael@xxxxxxxxxx>; Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>;
> >> linux-block@xxxxxxxxxxxxxxx; linux-pm@xxxxxxxxxxxxxxx
> >> Cc: linux-kernel@xxxxxxxxxxxxxxx
> >> Subject: Re: [PATCH] cpuidle: psd: add power sleep demotion prevention for
> >> fast I/O devices
> >>
> >> On 3/17/25 3:03 AM, King, Colin wrote:
> >>> This code is optional, one can enable it or disable it via the config
> >>> option. Also, even when it is built-in one can disable it by writing 0 to the
> >> sysfs file
> >>> /sys/devices/system/cpu/cpuidle/psd_cpu_lat_timeout_ms
> >>
> >> I'm not sure we need even more configuration knobs in sysfs.
> >
> > It's useful for enabling / disabling the functionality, as well as some form of tuning for slower I/O devices, so I think it is justifiable.
> >
> >> How are users
> >> expected to find this configuration option? How should they decide whether
> >> to enable or to disable it?
> >
> > I can send a V2 with some documentation if that's required.
> >
> >>
> >> Please take a look at this proposal and let me know whether this would solve
> >> the issue that you are looking into: "[LSF/MM/BPF Topic] Energy- Efficient I/O"
> >> (https://lore.kernel.org/linux-block/ad1018b6-7c0b-4d70-
> >> b845-c869287d3cf3@xxxxxxx/). The only disadvantage of this approach
> >> compared to the cpuidle patch is that it requires RPM (runtime power
> >> management) to be enabled. Maybe I should look into modifying the
> >> approach such that it does not rely on RPM.
> >
> > I've had a look, the scope of my patch is a bit wider. If my patch gets accepted I'm
> > going to also look at putting the psd call into other devices (such as network devices) to
> > also stop deep states while these devices are busy. Since the code is very lightweight I
> > was hoping this was going to be relatively easy and simple to use in various devices in the future.
>
> IMO this needs to be a lot more fine-grained then, both in terms of which devices or even
> IO is affected (Surely some IO is fine with at least *some* latency) but also how aggressive
> we are in blocking.
> Just looking at some common latency/residency of idle states out there I don't think
> it's reasonable to force polling for a 3-10ms (rounding up with the jiffie) period.
> Playing devil's advocate if the system is under some thermal/power pressure we might
> actually reduce throughput by burning so much power on this.
> This seems like the stuff that is easily convincing because it improves throughput and
> then taking care of power afterwards is really hard. :/

I agree and recall the iowait thing that you've recently eliminated
from the menu governor. Its purpose was ostensibly very similar and
it had similar issues.

Besides, any piece of kernel code today can add a CPU latency QoS
request, either per CPU or globally, and manage it as desired. The
hard part is to know when to use it and what limit to set through it.