On Mon, 25 Sep 2017, Mike Travis wrote:
On 9/25/2017 8:30 AM, Thomas Gleixner wrote:
Aside of that I really do not like this kind of special case hackery. The
real question is whether we need to enforce TSC_ADJUST == 0 on the boot cpu
at all. In principle we don't anymore now that we handle that TSC deadline
timer wreckage cleanly.
I am hesitant to make such a global change as it appears the author
intentionally added this. It not only caused our internal tsc sync tests to
become totally out of whack, it also generated an avalanche of error messages
to the system console (>3000 messages for a 32 socket Skylake system). And I
don't have the means to test how major changes to the TSC adjust functions
will affect standard whitebox PC's.
The reason why I put it there in the first place was to make the TSC
deadline timer work on a whole range of systems. It turned out that our
'fix' was not enough so we changed that later to disable the deadline timer
completely on affected systems when the firmware does not contain a fix for
it. So there is no real technical reason anymore to enforce TSC_ADJUST == 0
on the boot CPU. So rather than special casing this for UV we should just
remove that requirement and leave the boot value as is.
Our BIOS team did make a change to conform to the "TSC_ADJUST should be the
same on all cpu threads on a single socket" requirement, so we were able to
pass that part of the TSC validation functions. (Prior to this, the TSC's
were synced by writing directly to the TSC MSR and natural delays in the
processor firmware caused the slight differences in the TSC ADJUST values.)
Right. TSC_ADJUST is there for a reason.
But the UV 'boot chassis at different times' brings me to a related
question:
Essentially what happens is the system reset signals are distributed in
various ways which cause the different chassis to start up asynchronously with
each other. The UV system is not "hard" bound to each other but adapts to the
system configuration as it starts up.
I figured that much.
How is this setup dealing with ART (Always Running Timer, which is
distributed over PCIe for hardware timestamping and hardware assisted event
correlation)?
I assume that ART on UV is also per chassis, but that means that the
documented relation ship of:
TSC = ART * n/d + offset
where $offset is system wide (the TSC_ADJUST value of the boot cpu), is
not applicable.
Is there some other magic in play which makes ART work across chassis?
Thanks,
tglx
Sorry, I'm not sure how the UV hardware mimics the concept of 'ART'. It does
have an external clock generator that is distributed as part of the NumaLink
protocol and signal set. Since separate chassis can be configured to be
either within the same SSI or in separate SSI's then it has the ability to
configure which chassis are in sync with each other and which are on a
different clock sync. This is all within the purview of the BIOS folks.
Cute. How is that supposed to work, when the chassis are out of sync?
We do have independent methods to verify if TSCs' are in sync with each other
by measuring the skew rate. Typical deviations on UV are within a two digit
clock tick spread, which at an Uncore frequency of 2.5Ghz is in the small
single digit or less nanosecond range.
That should be good enough to pass the kernel side tests.
But back to my question about ART. You might talk to your BIOS/HW folks
about that and eventually disable the ART related functionality in the
kernel on UV.