Re: [PATCH V3 10/10] x86/pks: Add PKS test code

From: Ira Weiny
Date: Thu Dec 17 2020 - 23:06:12 EST


On Thu, Dec 17, 2020 at 12:55:39PM -0800, Dave Hansen wrote:
> On 11/6/20 3:29 PM, ira.weiny@xxxxxxxxx wrote:
> > + /* Arm for context switch test */
> > + write(fd, "1", 1);
> > +
> > + /* Context switch out... */
> > + sleep(4);
> > +
> > + /* Check msr restored */
> > + write(fd, "2", 1);
>
> These are always tricky. What you ideally want here is:
>
> 1. Switch away from this task to a non-PKS task, or
> 2. Switch from this task to a PKS-using task, but one which has a
> different PKS value

Or both...

>
> then, switch back to this task and make sure PKS maintained its value.
>
> *But*, there's no absolute guarantee that another task will run. It
> would not be totally unreasonable to have the kernel just sit in a loop
> without context switching here if no other tasks can run.
>
> The only way you *know* there is a context switch is by having two tasks
> bound to the same logical CPU and make sure they run one after another.

Ah... We do that.

...
+ CPU_ZERO(&cpuset);
+ CPU_SET(0, &cpuset);
+ /* Two processes run on CPU 0 so that they go through context switch. */
+ sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset);
...

I think this should be ensuring that both the parent and the child are
running on CPU 0. At least according to the man page they should be.

<man>
A child created via fork(2) inherits its parent's CPU affinity mask.
</man>

Perhaps a better method would be to synchronize the 2 threads more to ensure
that we are really running at the 'same time' and forcing the context switch.

> This just gets itself into a state where it *CAN* context switch and
> prays that one will happen.

Not sure what you mean by 'This'? Do you mean that running on the same CPU
will sometimes not force a context switch? Or do you mean that the sleeps
could be badly timed and the 2 threads could run 1 after the other on the same
CPU? The latter is AFAICT the most likely case.

>
> You can also run a bunch of these in parallel bound to a single CPU.
> That would also give you higher levels of assurance that *some* context
> switch happens at sleep().

I think more cycles is a good idea for sure. But I'm more comfortable with
forcing the test to be more synchronized so that it is actually running in the
order we think/want it to be.

>
> One critical thing with these tests is to sabotage the kernel and then
> run them and make *sure* they fail. Basically, if you screw up, do they
> actually work to catch it?

I'll try and come up with a more stressful test.

Ira