Re: [v2 PATCH 1/1] PM: QoS: Introduce boot parameter pm_qos_resume_latency_us

From: Aaron Tomlin

Date: Thu Mar 05 2026 - 21:35:52 EST

On Wed, Mar 04, 2026 at 05:05:13PM +0800, Zhongqiu Han wrote:
> Hello Aaron,
>
> Therefore, once a PM QoS constraint is set via boot parameters, it
> cannot be relaxed afterward by any means, except by applying a stricter
> constraint, right?

Hi Zhongqiu,

Thank you for your feedback.

This is a perfectly logical assumption, but it is actually incorrect.
The boot parameter serves only to establish the initial latency limit when
dev_pm_qos_expose_latency_limit() is invoked during CPU registration.

Once the system has booted and userspace is fully initialised, an
administrator retains full control via the standard sysfs interface
(i.e., /sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us).
Writing a numerical value, or the special string "n/a", to this sysfs node
will successfully relax or completely remove the constraint. The boot
parameter acts as a crucial "birth constraint" to protect early boot
phases, rather than a permanent, immutable ceiling.

> > - The parsing logic enforces a "First Match Wins" policy: if a
> > CPU falls into multiple specified ranges, the latency value
> > from the first matching entry is used.
>
> May I know would it be more reasonable to apply the minimum constraint?

Enforcing a "First Match Wins" parsing policy keeps the early-boot logic
deliberately simple, predictable, and deterministic. It avoids the
computational overhead of iteratively resolving and comparing overlapping
cpumask conflicts during a sensitive phase of the kernel boot process.
Given that this parameter is intended for deliberate, static configuration,
I favoured parsing simplicity. That being said, if there is a strong
consensus that evaluating for the minimum constraint provides a far more
intuitive user experience, I am certainly open to revising this logic in
v3.

> > +#include <linux/list.h>
> > +
> > +#include <asm/setup.h>
>
> Including <asm/setup.h> in generic PM/QoS code seems not appropriate,
> and pulling in an arch specific header creates unnecessary coupling and
> potential portability issues.

I completely agree with your assessment. Introducing an
architecture-specific header into generic power management code is poor
practice. I will refactor the parameter handling in the next iteration to
eliminate the reliance on COMMAND_LINE_SIZE entirely. Likely by retaining a
pointer to the original command-line string or dynamically duplicating the
string. Thus cleanly removing the need for <asm/setup.h>.

> > +struct pm_qos_boot_entry {
> > + struct list_head node;
>
> How about to use array instead of list to optimize lookup and cache
> hitting? but array may use more memory.

While an array would theoretically improve spatial locality and cache hit
rates, we anticipate the number of boot-time entries provided by users to
be exceptionally small (typically just one or two discrete ranges).

For such a diminutive dataset, the overhead of traversing a linked list
during CPU registration is virtually immeasurable. Utilising a list allows
the code to gracefully accommodate any number of user-defined ranges
without imposing arbitrary hard limits or engineering complex dynamic array
reallocations. Consequently, I believe the linked list remains the most
pragmatic and robust choice for this specific scenario.

> > +__setup("pm_qos_resume_latency_us=", pm_qos_resume_latency_us_setup);
> > +
> > +/* init_pm_qos_resume_latency_us_setup - Parse the pm_qos_resume_latency_us boot parameter.
>
> Nit: style issue, /**

Acknowledged.

> > + if (entry->latency < 0) {
> > + pr_warn("pm_qos: Latency requirement cannot be negative: %d\n",
> > + entry->latency);
>
> Nit: It would be cleaner to use pr_fmt() for the log prefix rather than
> embedding "pm_qos:" in every message such as:
>
> #define pr_fmt(fmt) "pm_qos_boot: " fmt

Thank you for pointing this out. That is an elegant and standard approach.
I shall incorporate the pr_fmt() macro definition to ensure the logging
remains tidy and consistent.

> > + list_add_tail(&entry->node, &pm_qos_boot_list);
>
> I saw there is no protect for pm_qos_boot_list, my understanding is that
> during early boot, pm_qos_boot_list is fully initialized before any
> other register_cpu() calls. For CPU hotplug, there only reads the list
> and it’s never modified afterward, so there shouldn’t be a race.

Your understanding is perfectly correct. The pm_qos_boot_list is populated
entirely within an early_initcall(). At this specific stage of the boot
sequence, execution is strictly single-threaded on the boot CPU, occurring
well before SMP initialisation or any secondary CPU bring-up (and
consequently, before register_cpu() is invoked for non-boot CPUs).

Once this early initialisation phase concludes, the list becomes strictly
read-only for the remainder of the system's uptime. Therefore, lockless
traversal during CPU registration and subsequent CPU hotplug events is
entirely safe. Introducing a lock or RCU protection here would merely add
unnecessary overhead to a static data structure.

> > + free_cpumask_var(covered);
> > + return 0;
>
> The function comment says "return a negative error code on failure",
> but in the cleanup context the function unconditionally returns 0.
> should it be:
>
> set "ret = -ENOMEM;" before goto cleanup and then just return ret?

Thank you for catching this oversight; you are absolutely right. The
cleanup path erroneously returns 0 despite the function's documentation
explicitly stating that a negative error code should be returned upon
failure.

Your suggested approach is exactly how this should be handled. This logic
will be rectified in the v3 patch.

Kind regards,
--
Aaron Tomlin

Attachment: signature.asc
Description: PGP signature