Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment

From: Thomas Gleixner
Date: Mon May 02 2022 - 03:56:12 EST


On Fri, Apr 29 2022 at 13:41, Waiman Long wrote:
> On 4/27/22 18:38, Thomas Gleixner wrote:
>> I grant you that it does not matter for the loop under the assumption
>> that the loop runs at constant frequency, but is that a guarantee that
>> it does not matter later on?
> Yes, that is my point that frequency doesn't matter if frequency remain
> the same. Of course, all bets are off if frequency really change.

Frequency changes after boot.

>> If you overcompensate by a factor of 3 because the upcoming CPU ran at
>> the lowest frequency, then it might become visible later when everything
>> runs at full speed.
> I don't think the overhead will be directly proportional to the cpu
> frequency. A 3X increase in frequency will certainly cause the overhead
> to be lowered, but it won't be 1/3. Maybe 1/2 at most.

It's very close to proportional according to my experiments.

>>> However about we half the measure sync_overhead as compensation to avoid
>>> over-estimation, but probably increase the chance that we need a second
>>> adjustment of TSC wrap.
>> Half of what?
>
> What I mean is
>
> @@ -533,7 +551,7 @@ void check_tsc_sync_target(void)
>          * really depends on CPU, node distance and frequency. Add the
>          * estimated sync overhead to the adjustment value.
>          */
> -       cur->adjusted += cur_max_warp + sync_overhead;
> +       cur->adjusted += cur_max_warp + sync_overhead/2;

The point is? Make it magically half while it still jumps around:

>>> With this patch applied, the measured overhead on the same CooperLake
>>> system on different reboot runs varies from 104 to 326.
>> Half of something which jumps around? Not convinced. :)

...

> I have applied your patch with some modification and below was the
> relevant part of the boot up log on a Cooperlake system with this TSC
> sync problem.
>
> [    0.008858] smpboot: CPU 36 Converting physical 0 to logical die 2
> [    0.008858] Sync overhead: 230
> [    0.008858] Sync overhead: 547 A: 149597 M: 149596 F: 2500016
> [    0.008858] TSC ADJUST compensate: CPU36 observed 76342 warp
> (overhead 230). Adjust: 76457
> [    0.008858] smpboot: CPU 54 Converting physical 0 to logical die 3
> [    0.008858] Sync overhead: 178
> [    0.008858] Sync overhead: 658 A: 177970 M: 177968 F: 2500028
> [    0.008858] TSC ADJUST compensate: CPU54 observed 76568 warp
> (overhead 178). Adjust: 76657
>
> BTW, CPUs 36 and 54 are the first CPU of socket 2 and 3 respectively. It
> is always these CPUs that need TSC adjustment.
>
> I have no idea why the sync overhead actually increase in the subsequent
> measurement as the frequency was set at 2.5GHz, I think.

See, that's the thing I'm worried about. An allegedly remote lock takes
178 cycles, but a loop of cache local lock operations takes on average
658 cycles.

That does not make sense and I haven't seen that behaviour on any of my
machines I ran that on. The average was always less than the initial
measurement.

> I tried to offline all the CPUs in a socket and then online the first
> CPU as suggested. However, I was not able to cause the tsc_sync loop
> to run.

It runs for sure when the first CPU on a socket goes online except when
TSC is already disabled. It does not matter whether it was online before
or not.

> BTW, without my patch the same system will boot up with the following
> log:

We know that by now.

> How about we add the full sync_overhead at bootup stage, but then half
> sync_overhead after boot in case the tsc_sync loop is run again for new
> online CPUs? Or any other scheme that you think is appropriate.

Can you please stop grasping for straws?

I'm as much interested as you to get this solved, but I'm not interested
in 'works for me' and 'we don't know why it works' solutions simply
because they end up on my desk sooner than later again.

Thanks

tglx