On Mon, Apr 25 2022 at 09:20, Waiman Long wrote:
On 4/22/22 06:41, Thomas Gleixner wrote:Yes. It's clear that the initial sync overhead is due to the cache line
I did some experiments and noticed that the boot time overhead isOne explanation of the sync overhead difference (118 vs 51) here is
different from the overhead when doing the sync check after boot
(offline a socket and on/offline the first CPU of it several times).
During boot the overhead is lower on this machine (SKL-X), during
runtime it's way higher and more noisy.
The noise can be pretty much eliminated by running the sync_overhead
measurement multiple times and building the average.
The reason why it is higher is that after offlining the socket the CPU
comes back up with a frequency of 700Mhz while during boot it runs with
2100Mhz.
Sync overhead: 118
Sync overhead: 51 A: 22466 M: 22448 F: 2101683
whether the lock cacheline is local or remote. My analysis the
interaction between check_tsc_sync_source() and check_tsc_sync_target()
is that real overhead is about locking with remote cacheline (local to
source, remote to target). When you do a 256 loop of locking, it is all
local cacheline. That is why the overhead is lower. It also depends on
if the remote cacheline is in the same socket or a different socket.
being remote, but I rather underestimate the compensation. Aside of that
it's not guaranteed that the cache line is actually remote on the first
access. It's by chance, but not by design.
The question is not whether the clock frequency changes during the loop.Sync overhead: 178Yes, I will try that experiment and report back the results.
Sync overhead: 152 A: 22477 M: 67380 F: 700529
Sync overhead: 212
Sync overhead: 152 A: 22475 M: 67380 F: 700467
Sync overhead: 153
Sync overhead: 152 A: 22497 M: 67452 F: 700404
Can you try the patch below and check whether the overhead stabilizes
accross several attempts on that copperlake machine and whether the
frequency is always the same or varies?
Independent of the outcome on that, I think have to take the actual CPUAssuming that the clock frequency remains the same during the
frequency into account for calculating the overhead.
check_tsc_warp() loop and the sync overhead computation time, I don't
think the actual clock frequency matters much. However, it will be a
different matter if the frequency does change. In this case, it is more
likely the frequency will go up than down. Right? IOW, we may
underestimate the sync overhead in this case. I think it is better than
overestimating it.
The point is:
start = rdtsc();
do_stuff();
end = rdtsc();
compensation = end - start;
do_stuff() executes a constant number of instructions which are executed
in a constant number of CPU clock cycles, let's say 100 for simplicity.
TSC runs with 2000MHz.
With a CPU frequency of 1000 MHz the real computation time is:
100/1000MHz = 100 nsec = 200 TSC cycles
while with a CPU frequency of 2000MHz it is obviously:
100/2000MHz = 50 nsec = 100 TSC cyles
IOW, TSC runs with a constant frequency independent of the actual CPU
frequency, ergo the CPU frequency dependent execution time has an
influence on the resulting compensation value, no?
On the machine I tested on, it's a factor of 3 between the minimal and
the maximal CPU frequency, which makes quite a difference, right?