Re: [PATCH RFC v6 03/18] riscv: add support for srmcfg CSR from Ssqosid extension

From: Drew Fustini

Date: Mon Jun 01 2026 - 20:59:39 EST


On Mon, Jun 01, 2026 at 08:49:30PM +0000, sashiko-bot@xxxxxxxxxx wrote:
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> > +/*
> > + * Invalidate the cache on offline too. The sentinel persists across the
> > + * offline period, so a CPU brought back online finds the cache already
> > + * invalidated before it is schedulable. This closes the window where a
> > + * task scheduled before riscv_srmcfg_online() runs could match a stale
> > + * cache and skip the CSR write while the hardware CSR was reset across
> > + * hart stop/start.
> > + */
> > +static int riscv_srmcfg_offline(unsigned int cpu)
> > +{
> > + per_cpu(cpu_srmcfg, cpu) = U32_MAX;
>
> [Severity: High]
> Will this U32_MAX sentinel be overwritten before the CPU actually halts?
>
> Because riscv_srmcfg_offline() executes at CPUHP_AP_ONLINE_DYN, the dying CPU
> continues tearing down and eventually context switches to the idle thread to
> physically offline.
>
> This context switch invokes __switch_to_srmcfg(), which writes the hardware
> CSR and overwrites the cpu_srmcfg cache with the idle task's value,
> permanently losing the U32_MAX sentinel.
>
> When the CPU comes back online, its hardware CSR is in an implementation
> defined reset state, but the software cache holds a valid value. Any early
> tasks sharing this stale value might skip the CSR write and execute with
> incorrect QoS tags until riscv_srmcfg_online() is reached.

This is not a problem. riscv_srmcfg_online() writes U32_MAX again on
bring-up, and it runs at CPUHP_AP_ONLINE_DYN before the CPU becomes
schedulable.

> > + return 0;
> > +}
> > +
> > +/*
> > + * CPU PM notifier: invalidate the cached srmcfg on resume from a deep
> > + * idle / suspend. Ssqosid v1.0 leaves CSR_SRMCFG state across low-power
> > + * transitions implementation-defined, and the boot CPU never goes
> > + * through the cpuhp online callback during system suspend, so without
> > + * this hook __switch_to_srmcfg() would skip the CSR write when the
> > + * outgoing task happens to share its srmcfg with the pre-suspend cache.
> > + */
> > +static int riscv_srmcfg_pm_notify(struct notifier_block *nb,
> > + unsigned long action, void *unused)
> > +{
> > + switch (action) {
> > + case CPU_PM_EXIT:
> > + case CPU_PM_ENTER_FAILED:
> > + __this_cpu_write(cpu_srmcfg, U32_MAX);
>
> [Severity: High]
> Does this notifier need to eagerly restore the hardware CSR for the currently
> executing thread?
>
> It sets cpu_srmcfg to U32_MAX to force a CSR write on the next context
> switch. However, upon waking from low-power states like suspend-to-RAM, the
> CPU does not immediately context switch.
>
> It unwinds the suspend stack, resumes executing the interrupted thread, and
> services pending interrupts.
>
> Because the hardware CSR is not explicitly restored here, all code executing
> between wakeup and the first context switch might run with the hardware's
> implementation defined reset value for CSR_SRMCFG instead of the task's
> assigned QoS tags.

This is not a problem. Before userspace runs on this CPU again,
__switch_to_srmcfg() writes the CSR from the next task's thread.srmcfg.

> > + /*
> > + * cpuhp_setup_state() invokes the startup callback locally on every
> > + * already-online CPU, so no separate seed loop is needed here.
> > + */
> > + err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
> > + riscv_srmcfg_online, riscv_srmcfg_offline);
> > + if (err < 0) {
> > + pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
> > + err);
> > + return err;
>
> [Severity: Medium]
> Is it intentional to skip PM notifier registration if CPU hotplug setup fails?
>
> This early return skips cpu_pm_register_notifier() entirely.
>
> While cpuhp_setup_state() failures are rare, PM notifiers do not strictly
> depend on CPU hotplug registration. If this error path is taken, tasks waking
> from suspend could run with incorrect QoS tags because the PM notifier is never
> registered.

This is very unlikely. I will not be making any changes.

Drew