Re: [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction

Next message: Vikash Garodia: "Re: [PATCH v2 1/7] media: dt-bindings: qcom-kaanapali-iris: Add kaanapali video codec binding"
Previous message: Joshua Hahn: "Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure"
In reply to: Sean Christopherson: "Re: [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction"
Next in thread: Dave Hansen: "Re: [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Dave Hansen

Date: Wed Mar 04 2026 - 10:30:36 EST

On 3/4/26 07:01, Sean Christopherson wrote:
> I don't see any performance data in either posted version. Bluntly, this series
> isn't going anywhere without data to guide us. E.g. comments like this from v1
>
> : And there is a cost associated with re-enabling the optimizations for all
> : system RAM (even though it runs as a background kernel thread executing RMPOPT
> : on different 1GB regions in parallel and with inline cond_resched()'s),
> : we don't want to run this periodically.
>
> suggest there is meaningful cost associated with the scan.

Well the RMP is 0.4% of the size of system memory, and I assume that you
need to scan the whole table. There are surely shortcuts for 2M pages,
but with 4k, that's ~8.5GB of RMP table for 2TB of memory. That's an
awful lot of memory traffic for each CPU.

It'll be annoying to keep a refcount per 1GB of paddr space.

One other way to do it would be to loosely mirror the RMPOPT bitmap and
keep our own bitmap of 1GB regions that _need_ RMPOPT run on them. Any
private=>shared conversion sets a bit in the bitmap and schedules some
work out in the future.

It could also be less granular than that. Instead of any private=>shared
conversion, the RMPOPT scan could be triggered on VM destruction which
is much more likely to result in RMPOPT doing anything useful.

BTW, I assume that the RMPOPT disable machinery is driven from the
INVLPGB-like TLB invalidations that are a part of the SNP
shared=>private conversions. It's a darn shame that RMPOPT wasn't
broadcast in the same way. It would save the poor OS a lot of work. The
RMPOPT table is per-cpu of course, but I'm not sure what keeps *a* CPU
from broadcasting its success finding an SNP-free physical region to
other CPUs.

tl;dr: I agree with you. The cost of these scans is going to be
annoying, and it's going to need OS help to optimize it.