Re: [PATCH v4 4/7] x86/sev: Add support to perform RMP optimizations asynchronously

From: Ackerley Tng

Date: Fri May 01 2026 - 14:58:09 EST


Ashish Kalra <Ashish.Kalra@xxxxxxx> writes:

>
> [...snip...]
>
> +/*
> + * 'val' is a system physical address.
> + */
> +static void rmpopt_smp(void *val)
> +{
> + u64 rax = ALIGN_DOWN((u64)val, SZ_1G);
> + u64 rcx = RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS;
> +
> + __rmpopt(rax, rcx);
> +}
> +
> +static void rmpopt(u64 pa)
> +{
> + u64 rax = ALIGN_DOWN(pa, SZ_1G);
> + u64 rcx = RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS;
> +
> + __rmpopt(rax, rcx);
> +}
> +

Could rmpopt_smp() call rmpopt() to remove duplicate code?

> +/*
> + * RMPOPT optimizations skip RMP checks at 1GB granularity if this
> + * range of memory does not contain any SNP guest memory.
> + */
> +static void rmpopt_work_handler(struct work_struct *work)
> +{
> + bool current_cpu_cleared = false;
> + phys_addr_t pa;
> +
> + pr_info("Attempt RMP optimizations on physical address range @1GB alignment [0x%016llx - 0x%016llx]\n",
> + rmpopt_pa_start, rmpopt_pa_end);
> +
> + /*
> + * RMPOPT scans the RMP table, stores the result of the scan in the
> + * reserved processor memory. The RMP scan is the most expensive
> + * part. If a second RMPOPT occurs, it can skip the expensive scan
> + * if they can see a cached result in the reserved processor memory.
> + *
> + * Do RMPOPT on one CPU alone. Then, follow that up with RMPOPT
> + * on every other primary thread. This potentially allows the
> + * followers to use the "cached" scan results to avoid repeating
> + * full scans.

Out of curiosity, how does this caching work? Is it possible to do it
once and then synchronize the cache to the other CPUs?

> + */
> +
> + if (cpumask_test_cpu(smp_processor_id(), &rmpopt_cpumask)) {
> + cpumask_clear_cpu(smp_processor_id(), &rmpopt_cpumask);
> + current_cpu_cleared = true;
> + }
> +
> + /* current CPU */
> + for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G)
> + rmpopt(pa);
> +
> + for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G) {
> + on_each_cpu_mask(&rmpopt_cpumask, rmpopt_smp,
> + (void *)pa, true);
> +
> + /* Give a chance for other threads to run */
> + cond_resched();
> +
> + }
> +
> + if (current_cpu_cleared)
> + cpumask_set_cpu(smp_processor_id(), &rmpopt_cpumask);

Sashiko [1] pointed this out: after cond_resched(), this code might be
on a different cpu so smp_processor_id() would return a different cpu,
that would mess up the global cpumask.

Perhaps it's better to store the id on a stack? Or actually, what if we
give on_each_cpu_mask a copy of rmpopt_cpumask with the current cpu
unset?

[1] https://sashiko.dev/#/patchset/cover.1775874970.git.ashish.kalra%40amd.com

> +}
> +
> void snp_setup_rmpopt(void)
> {
> u64 rmpopt_base;
> @@ -568,9 +656,20 @@ void snp_setup_rmpopt(void)
> if (!cpu_feature_enabled(X86_FEATURE_RMPOPT))
> return;
>
> + /*
> + * Create an RMPOPT-specific workqueue to avoid scheduling
> + * RMPOPT workitem on the global system workqueue.
> + */
> + rmpopt_wq = alloc_workqueue("rmpopt_wq", WQ_UNBOUND, 1);
> + if (!rmpopt_wq) {
> + setup_clear_cpu_cap(X86_FEATURE_RMPOPT);
> + return;
> + }
> +
> /*
> * RMPOPT_BASE MSR is per-core, so only one thread per core needs to
> - * setup RMPOPT_BASE MSR.
> + * setup RMPOPT_BASE MSR. Additionally only one thread per core needs
> + * to issue the RMPOPT instruction.
> */
>
> for_each_online_cpu(cpu) {
> @@ -590,6 +689,20 @@ void snp_setup_rmpopt(void)
> * up to 2 TB of system RAM on all CPUs.
> */
> wrmsrq_on_cpus(&rmpopt_cpumask, MSR_AMD64_RMPOPT_BASE, rmpopt_base);
> +
> + INIT_DELAYED_WORK(&rmpopt_delayed_work, rmpopt_work_handler);
> +
> + rmpopt_pa_end = ALIGN(PFN_PHYS(max_pfn), SZ_1G);
> +
> + /* Limit memory scanning to the first 2 TB of RAM */

I think this is better phrased as "limit memory scanning to 2TB",

> + if ((rmpopt_pa_end - rmpopt_pa_start) > SZ_2T)
> + rmpopt_pa_end = rmpopt_pa_start + SZ_2T;

and then this could be

rmpopt_pa_end = min(rmpopt_pa_end, rmpopt_pa_start + SZ_2T);

> +
> + /*
> + * Once all per-CPU RMPOPT tables have been configured, enable RMPOPT
> + * optimizations on all physical memory.
> + */
> + queue_delayed_work(rmpopt_wq, &rmpopt_delayed_work, 0);
> }
> EXPORT_SYMBOL_FOR_MODULES(snp_setup_rmpopt, "ccp");
>

Reviewed-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>