Re: [PATCH linux-next v2] x86/xen/time: prefer tsc as clocksource when it is invariant

From: Krister Johansen
Date: Tue Dec 13 2022 - 14:07:39 EST


On Tue, Dec 13, 2022 at 08:23:29AM +0100, Jan Beulich wrote:
> On 12.12.2022 23:05, Krister Johansen wrote:
> > On Mon, Dec 12, 2022 at 05:46:29PM +0100, Jan Beulich wrote:
> >> On 12.12.2022 17:05, Krister Johansen wrote:
> >>> Both the Intel SDM[4] and the Xen tsc documentation explain that marking
> >>> a tsc as invariant means that it should be considered stable by the OS
> >>> and is elibile to be used as a wall clock source. The Xen documentation
> >>> further clarifies that this is only reliable on HVM and PVH because PV
> >>> cannot intercept a cpuid instruction.
> >>
> >> Without meaning to express a view on the argumentation as a whole, this
> >> PV aspect is suspicious. Unless you open-code a use of the CPUID insn
> >> in the kernel, all uses of CPUID are going to be processed by Xen by
> >> virtue of the respective pvops hook. Documentation says what it says
> >> for environments where this might not be the case.
> >
> > Thanks, appreciate the clarification here. Just restating this for my
> > own understanding: your advice would be to drop this check below?
>
> No, I'm unconvinced of if/where this transformation is really appropriate.

Ah, I see. You're saying that you're not convinced that this code
should ever lower the priority of xen clocksource in favor of the tsc?
If so, are you willing to say a bit more about what you find to be
unconvincing?

In as much detail as I can muster: the impetus for the patch was that I
had a variety of different systems running as both kvm and xen guests.
Some of these guests had clocksource tunings in place as a result of
consulting the documentation linked in the patch. But others didn't.
On kvm they had somehow done the "right" (?) thing. Some systems had
tuning in place for xen, despite no longer being a xen guests. Other
systems were running on xen and didn't have the recommended tuning
applied. That's all sorted now, but it seemed like it might be nice to
eliminate the need for others to do this in future. With kvm doing
something similar, I thought there might be enough precedent to consider
this for xen guests.

> My comment was merely to indicate that the justification for ...
>
> >>> + if (!(xen_hvm_domain() || xen_pvh_domain()))
> >>> + return 0;
>
> ... this isn't really correct.

The rationale for this bit of code was the justification that turns
out to be incorrect. That sounds to me like I have unnessary code,
unless I was right by mistake?

> > And then update the commit message to dispense with the distinction
> > between HVM, PV, and PVH?
> >
> >>> + cpuid(xen_cpuid_base() + 3, &eax, &ebx, &ecx, &edx);
> >>
> >> Xen leaf 3 has sub-leaves, so I think you need to set ecx to zero before
> >> this call.
> >
> > The cpuid() inline in arch/x86/include/asm/processor.h assigns zero to
> > ecx prior to calling __cpuid. In arch/x86/boot/cpuflags.c the macros
> > are a little different, but it looks like there too, the macro passes 0
> > as an input argument to cpuid_count which ends up being %ecx. Happy to
> > fix this up if I'm looking at the wrong cpuid functions, though.
>
> Oh, I didn't expect cpuid() to be more than a trivial wrapper around the
> the pvops hook, and I merely looked at native_cpuid() and xen_cpuid().
> I'm sorry for the noise then. Yet still, with there being sub-leaves, I'd
> recommend switching to cpuid_count() just for clarity.

No apology necessary. I'm happy to modify this to use cpuid_count() for
clarity.

-K