Re: propagating vmgenid outward and upward
From: Michael S. Tsirkin
Date: Thu Mar 03 2022 - 08:07:37 EST
On Wed, Mar 02, 2022 at 05:32:07PM +0100, Jason A. Donenfeld wrote:
> Hi Michael,
>
> On Wed, Mar 02, 2022 at 11:22:46AM -0500, Michael S. Tsirkin wrote:
> > > Because that 16 byte read of vmgenid is not atomic. Let's say you read
> > > the first 8 bytes, and then the VM is forked.
> >
> > But at this point when VM was forked plaintext key and nonce are all in
> > buffer, and you previously indicated a fork at this point is harmless.
> > You wrote "If it changes _after_ that point of check ... it doesn't
> > matter:"
>
> Ahhh, fair point. I think you're right.
>
> Alright, so all we're talking about here is an ordinary 16-byte read,
> and 16 bytes of storage per keypair, and a 16-byte comparison.
>
> Still seems much worse than just having a single word...
>
> Jason
Oh I forgot about __int128.
#include <stdio.h>
#include <assert.h>
#include <limits.h>
#include <string.h>
struct lng {
__int128 l;
};
struct shrt {
unsigned long s;
};
struct lng l = { 1 };
struct shrt s = { 3 };
static void test1(volatile struct shrt *sp)
{
if (sp->s != s.s) {
printf("short mismatch!\n");
s.s = sp->s;
}
}
static void test2(volatile struct lng *lp)
{
if (lp->l != l.l) {
printf("long mismatch!\n");
l.l = lp->l;
}
}
int main(int argc, char **argv)
{
volatile struct shrt sv = { 4 };
volatile struct lng lv = { 5 };
if (argc > 1) {
printf("test 1\n");
for (int i = 0; i < 100000000; ++i)
test1(&sv);
} else {
printf("test 2\n");
for (int i = 0; i < 100000000; ++i)
test2(&lv);
}
return 0;
}
with that the compiler has an easier time to produce optimal
code, so the difference is smaller.
Note: compiled with
gcc -O2 -mno-sse -mno-sse2 -ggdb bench3.c
since with sse there's no difference at all.
[mst@tuck ~]$ perf stat -r 100 ./a.out 1 > /dev/null
Performance counter stats for './a.out 1' (100 runs):
94.55 msec task-clock:u # 0.996 CPUs utilized ( +- 0.09% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
52 page-faults:u # 548.914 /sec ( +- 0.21% )
400,459,851 cycles:u # 4.227 GHz ( +- 0.03% )
500,147,935 instructions:u # 1.25 insn per cycle ( +- 0.00% )
200,032,462 branches:u # 2.112 G/sec ( +- 0.00% )
1,810 branch-misses:u # 0.00% of all branches ( +- 0.73% )
0.0949732 +- 0.0000875 seconds time elapsed ( +- 0.09% )
[mst@tuck ~]$
[mst@tuck ~]$ perf stat -r 100 ./a.out > /dev/null
Performance counter stats for './a.out' (100 runs):
110.19 msec task-clock:u # 1.136 CPUs utilized ( +- 0.18% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
52 page-faults:u # 537.743 /sec ( +- 0.22% )
428,518,442 cycles:u # 4.431 GHz ( +- 0.07% )
900,147,986 instructions:u # 2.24 insn per cycle ( +- 0.00% )
200,032,505 branches:u # 2.069 G/sec ( +- 0.00% )
2,139 branch-misses:u # 0.00% of all branches ( +- 0.77% )
0.096956 +- 0.000203 seconds time elapsed ( +- 0.21% )
--
MST