Re: [PATCH v3] x86/cpu: Use SERIALIZE in sync_core() when available

From: Ricardo Neri
Date: Thu Aug 06 2020 - 19:05:14 EST


On Thu, Aug 06, 2020 at 12:57:26PM -0700, Dave Hansen wrote:
> On 8/6/20 12:25 PM, Ricardo Neri wrote:
> > static inline void sync_core(void)
> > {
> > /*
> > - * There are quite a few ways to do this. IRET-to-self is nice
> > + * Hardware can do this for us if SERIALIZE is available. Otherwise,
> > + * there are quite a few ways to do this. IRET-to-self is nice
>
> This seems to imply that things other than SERIALIZE aren't the hardware
> doing this. All of these methods are equally architecturally
> serializing *and* provided by the hardware.

Indeed, I can see how the wording may imply that.

>
> I also don't quite get the point of separating the comments from the
> code. Shouldn't this be:
>
> /*
> * The SERIALIZE instruction is the most straightforward way to
> * do this but it not universally available.
> */

I regard the comment as describing all the considered options to for
architectural serialization. What about if I add SERIALIZE as another
option? I propose to discuss it towards the end of the comment:

/*
* There are quite a few ways to do this. IRET-to-self is nice
* because it works on every CPU, at any CPL (so it's compatible
* with paravirtualization), and it never exits to a hypervisor.
* The only down sides are that it's a bit slow (it seems to be
* a bit more than 2x slower than the fastest options) and that
* it unmasks NMIs. The "push %cs" is needed because, in
* paravirtual environments, __KERNEL_CS may not be a valid CS
* value when we do IRET directly.
*
* In case NMI unmasking or performance ever becomes a problem,
* the next best option appears to be MOV-to-CR2 and an
* unconditional jump. That sequence also works on all CPUs,
* but it will fault at CPL3 (i.e. Xen PV).
*
* CPUID is the conventional way, but it's nasty: it doesn't
* exist on some 486-like CPUs, and it usually exits to a
* hypervisor.
*
* The SERIALIZE instruction is the most straightforward way to
* do this as it does not clobber registers or exit to a
* hypervisor. However, it is not universally available.
*
* Like all of Linux's memory ordering operations, this is a
* compiler barrier as well.
*/

What do you think?

> if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
> asm volatile(__ASM_SERIALIZE ::: "memory");
> return;
> }
>
> /*
> * For all other processors, IRET-to-self is nice ...
> */
> iret_to_self();