Re: [PATCH v2 0/5] PM QoS: Add CPU affinity latency QoS support and resctrl integration

From: Zhongqiu Han

Date: Fri Oct 24 2025 - 04:26:24 EST

On 10/23/2025 9:09 PM, Lukasz Luba wrote:

Hi Zhongqui,

My apologies for being a bit late with my comments...

On 7/21/25 13:40, Zhongqiu Han wrote:

Hi all,

This patch series introduces support for CPU affinity-based latency
constraints in the PM QoS framework. The motivation is to allow
finer-grained power management by enabling latency QoS requests to target
specific CPUs, rather than applying system-wide constraints.

The current PM QoS framework supports global and per-device CPU latency
constraints. However, in many real-world scenarios, such as IRQ affinity
or CPU-bound kernel threads, only a subset of CPUs are
performance-critical. Applying global constraints in such cases
unnecessarily prevents other CPUs from entering deeper C-states, leading
to increased power consumption.

This series addresses that limitation by introducing a new interface that
allows latency constraints to be applied to a CPU mask. This is
particularly useful on heterogeneous platforms (e.g., big.LITTLE) and
embedded systems where power efficiency is critical for example:

                         driver A       rt kthread B      module C
   CPU IDs (mask):         0-3              2-5              6-7
   target latency(us):     20               30               100
                           |                |                |
                           v                v                v
                           +---------------------------------+
                           |        PM QoS Framework       |
                           +---------------------------------+
                           |                |                |
                           v                v                v
   CPU IDs (mask):        0-3            2-3,4-5            6-7
   runtime latency(us):   20             20, 30             100

The current implementation includes only cpu_affinity_latency_qos_add()
and cpu_affinity_latency_qos_remove() interfaces. An update interface is
planned for future submission, along with PM QoS optimizations in the UFS
subsystem.

Patch1 introduces the core support for CPU affinity latency QoS in the PM
QoS framework.

Patch2 removes redundant KERN_ERR prefixes in WARN() calls in the global
CPU PM QoS interface. This change addresses issues in existing code and is
not related to the new interface introduced in this patch series.

Patch3 adds documentation for the new interface.

Patch4 fixes a minor documentation issue related to the return type of
cpu_latency_qos_request_active(). This change addresses issues in existing
doc and is not related to the new interface introduced in this patch
series.

Patch5 updates the resctrl pseudo-locking logic to use the new CPU
affinity latency QoS helpers, improving clarity and consistency. The only
functional and beneficial change is that the new interface actively wakes
up CPUs whose latency QoS values have changed, ensuring the latency limit
takes effect immediately.

Could you describe a bit more the big picture of this proposed design,
please?

Ideally with some graph of connected frameworks & drivers and how they
are going to work together.

Hi Lukasz,
Thank you very much for your review and discussion~

I will describe you one big picture if needed, please allow me
illustrate a simple scenario using pseudo code first:

Suppose there is a USB driver. This driver uses the kernel existing
cpu_latency_qos_* interfaces to boost its IRQ execution efficiency. Its
IRQ affinity is set to core0 and core1 according to DTS config, and the
affinity of its threaded IRQ (bottom half) is also set to CPU0 and CPU1.

=================================================================
Using the kernel existing cpu_latency_qos_* interfaces:
=================================================================
static int dwc3_sample_probe(struct platform_device *pdev)
{
cpu_latency_qos_add_request(&foo->pm_qos_req,DEFAULT_VALUE);
xxxx;
ret = devm_request_threaded_irq(xxx,xxx,foo_dwc3_pwr_irq, ....)
xxxx;
}

static irqreturn_t foo_dwc3_pwr_irq(int irq, void *dev)
{
xxxx;
cpu_latency_qos_update_request(&foo->pm_qos_req, 0);

/*.... process interrupt ....*/

cpu_latency_qos_update_request(&foo->pm_qos_req, DEFAULT_VALUE);

return IRQ_HANDLED;

}

The number of IRQ executions on each CPU:
==================================================================
IRQ HWIRQ affinity CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
320 0xb0 0x3 9782468 415472 0 0 0 0 0 0
==================================================================

==================================================================
Process: irq/320-dwc3, [affinity: 0x3] cpu:1 pid:5250 ppid:2
==================================================================

From the code, we can see that the USB module using the kernel existing
cpu_latency_qos_* interfaces sets the CPU latency to 0, which prevents
all CPUs from entering idle states—even C1. During operation, the USB
IRQs is triggered 9,782,468 times on CPU0, and each time it runs, all
CPUs are blocked from entering deeper C-states. However, only CPU0, CPU1
are actually involved in handling the IRQ and its threaded bottom half.
It will cause unnecessary power consumption on other CPUs.
(Please note, due to the simplicity of the pseudocode, I did not show
how the IRQ bottom-half thread is synchronized to restrict CPU idle
states via PM QoS. In reality, it's clear that we can also apply a CPU
latency limit to the bottom-half thread.)

If we use current patch series API cpu_affinity_latency_qos_xxx, such
as:

=================================================================
Using current patch series cpu_affinity_latency_qos_* interfaces:
=================================================================
static int dwc3_sample_probe(struct platform_device *pdev)
{
cpu_affinity_latency_qos_add(&foo->pm_qos_req,DEFAULT_VALUE, mask);
xxxx;
ret = devm_request_threaded_irq(xxx,xxx,foo_dwc3_pwr_irq, ....)
xxxx;
}

We can only constrain the CPU latency PM QoS on CPU0 and CPU1 in order
to save power.

E.g.:
1. what are the other components in the kernel which would use this
feature?

1.Drivers such as Audio, USB, and UFS, which currently rely on the
kernel's global CPU Latency PM QoS interface, but only require latency
constraints on a subset of CPUs, can leverage this new interface to
achieve improved power efficiency.

2.I’m considering supporting this feature in userspace.
Once implemented, userspace threads—such as mobile gaming threads that
aim to constrain CPU latency and are already bound to big cores—will be
able to use the API to help save power.

2. is there also a user-space interface planned for it so a HAL in
the middle-ware would configure these "short-wake-up-CPU"?

Yes, I am considering to support userspace on patch V3.

3. Is it possible to view/debug from the user-space which component
requested this setting for some subsets of cpus?

I'm uncertain whether we should provide the ability to inspect
which components are applying constraints on CPU latency. However,
what I do want to ensure is that—similar to the existing /dev
cpu_dma_latency interface in the current kernel—I can offer per-CPU
level latency value setting and querying.

Regards,
Lukasz

--
Thx and BRs,
Zhongqiu Han