Re: [PATCH v2 3/7] x86/sev: add support for RMPOPT instruction
From: Kalra, Ashish
Date: Thu Mar 05 2026 - 14:23:31 EST
An update on performance data:
>
> RMPOPT after SNP guest shutdown:
> ...
> [ 298.746893] SEV-SNP: RMPOPT max. CPU cycles 248083620
> [ 298.746898] SEV-SNP: RMPOPT min. CPU cycles 60
> [ 298.746900] SEV-SNP: RMPOPT average cycles 127859
>
>
A single RMPOPT instruction should not be taking 248M cycles, so i looked at
my performance measurement code :
I was not disabling interrupts around my measurement code, so probably this
measurement code was getting interrupted/preempted and causing this discrepancy:
I am now measuring with interrupts disabled around this code:
static void rmpopt(void *val)
{
bool optimized;
u64 start, end;
local_irq_disable();
start = rdtsc_ordered();
asm volatile(".byte 0xf2, 0x0f, 0x01, 0xfc"
: "=@ccc" (optimized)
: "a" ((u64)val & PUD_MASK), "c" ((u64)val & 0x1)
: "memory", "cc");
end = rdtsc_ordered();
local_irq_enable();
total_cycles += (end - start);
++iteration;
if ((end - start) > largest_cycle_rmpopt) {
pr_info("RMPOPT max cycle on cpu %d, addr 0x%llx, cycles %llu, prev largest %llu\n",
smp_processor_id(), ((u64)val & PUD_MASK), end - start, largest_cycle_rmpopt);
largest_cycle_rmpopt = end - start;
}
...
...
But, the following is interesting, if I invoke rmpopt() using smp_call_on_cpu() which issues
RMPOPT on each CPU serially compared to using on_each_cpu_mask() above which will execute rmpopt()
function and RMPOPT instruction in parallel on multiple CPUs (by sending IPIs in parallel),
I observe a significant difference and improvement in "individual" RMPOPT instruction performance:
rmpopt() executing serially using smp_call_on_cpu():
[ 244.518677] SEV-SNP: RMPOPT instruction cycles 3300
[ 244.518716] SEV-SNP: RMPOPT instruction cycles 2840
[ 244.518758] SEV-SNP: RMPOPT instruction cycles 3260
[ 244.518800] SEV-SNP: RMPOPT instruction cycles 3640
[ 244.518838] SEV-SNP: RMPOPT instruction cycles 1980
[ 244.518878] SEV-SNP: RMPOPT instruction cycles 3420
[ 244.518919] SEV-SNP: RMPOPT instruction cycles 3620
[ 244.518958] SEV-SNP: RMPOPT instruction cycles 3120
[ 244.518997] SEV-SNP: RMPOPT instruction cycles 2160
[ 244.519038] SEV-SNP: RMPOPT instruction cycles 3040
[ 244.519078] SEV-SNP: RMPOPT instruction cycles 3700
[ 244.519119] SEV-SNP: RMPOPT instruction cycles 3960
[ 244.519158] SEV-SNP: RMPOPT instruction cycles 3420
[ 244.519211] SEV-SNP: RMPOPT instruction cycles 5080
[ 244.519254] SEV-SNP: RMPOPT instruction cycles 3000
[ 244.519295] SEV-SNP: RMPOPT instruction cycles 3420
[ 244.527150] SEV-SNP: RMPOPT max cycle on cpu 256, addr 0x40000000, cycles 34680, prev largest 22100
[ 244.529622] SEV-SNP: RMPOPT max cycle on cpu 320, addr 0x40000000, cycles 36800, prev largest 34680
[ 244.559314] SEV-SNP: RMPOPT max cycle on cpu 256, addr 0x80000000, cycles 39740, prev largest 36800
[ 244.561718] SEV-SNP: RMPOPT max cycle on cpu 320, addr 0x80000000, cycles 41840, prev largest 39740
[ 244.562837] SEV-SNP: RMPOPT max cycle on cpu 352, addr 0x80000000, cycles 42160, prev largest 41840
[ 244.886705] SEV-SNP: RMPOPT max cycle on cpu 384, addr 0x300000000, cycles 42300, prev largest 42160
[ 247.701377] SEV-SNP: RMPOPT max cycle on cpu 384, addr 0x1980000000, cycles 42400, prev largest 42300
[ 250.322355] SEV-SNP: RMPOPT max cycle on cpu 384, addr 0x2ec0000000, cycles 42420, prev largest 42400
[ 250.755457] SEV-SNP: RMPOPT max cycle on cpu 384, addr 0x3240000000, cycles 42540, prev largest 42420
[ 264.271293] SEV-SNP: RMPOPT max cycle on cpu 32, addr 0xa040000000, cycles 50400, prev largest 42540
[ 264.333739] SEV-SNP: RMPOPT max cycle on cpu 32, addr 0xa0c0000000, cycles 50940, prev largest 50400
[ 264.395521] SEV-SNP: RMPOPT max cycle on cpu 32, addr 0xa140000000, cycles 51240, prev largest 50940
[ 264.733133] SEV-SNP: RMPOPT max cycle on cpu 32, addr 0xa400000000, cycles 51480, prev largest 51240
[ 269.500891] SEV-SNP: RMPOPT max cycle on cpu 0, addr 0xcac0000000, cycles 66080, prev largest 51480
[ 273.507009] SEV-SNP: RMPOPT max cycle on cpu 320, addr 0xeb40000000, cycles 83680, prev largest 66080
[ 276.435091] SEV-SNP: RMPOPT largest cycles 83680
[ 276.435096] SEV-SNP: RMPOPT smallest cycles 60
[ 276.435097] SEV-SNP: RMPOPT average cycles 5658
[ 276.435098] SEV-SNP: RMPOPT cycles taken for physical address range 0x0000000000000000 - 0x0000010380000000 on all cpus 63815935380 cycles
Compare this to executing rmpopt() in parallel:
[ 1238.809183] SEV-SNP: RMPOPT average cycles 114372
So, looks like executing RMPOPT in parallel is causing performance degradation, which we will investigate.
But, these are the performance numbers you should be considering :
RMPOPT during boot:
[ 49.913402] SEV-SNP: RMPOPT largest cycles 1143020
[ 49.913407] SEV-SNP: RMPOPT smallest cycles 60
[ 49.913408] SEV-SNP: RMPOPT average cycles 5226
RMPOPT after SNP guest shutdown:
[ 276.435091] SEV-SNP: RMPOPT largest cycles 83680
[ 276.435096] SEV-SNP: RMPOPT smallest cycles 60
[ 276.435097] SEV-SNP: RMPOPT average cycles 5658
Thanks,
Ashish