> The CPU's on an X86 SMP box run a MESI cache. That means each cache line
> has 4 states - those states dont include a 'shared-modified' state. So
> when you have two people reading the same line and one writes it then the
> other reads it you get
> [M=modified E=exclusive S=shared I=invalid]
> S S
> E I
> M I (data now written)
> S S (data now readable)
>
> The move from modified to shared especially is quite slow as it requires
> the cache line is written to main memory and then read by the other CPU.
> That will disturb any measurements because in CPU clock source counts thats
> a long time. You probably need to measure this as well as measuring the
> TSC difference some how to get a better view
Um, yes, indeed. The move from shared to invalid requires a bus transaction
telling the other caches to invalidate their copies, and the move from
modified to shared involves
- Other reader tries to read from main memory.
- This cache notices that it has a modified copy of the data.
- This cache interrupts saying "the data in memory is not accurate;
please turn your back for a minute while I fix it."
- This cache writes the data back into memory. "Okay, you can look now."
- The reader retries and
However, slow as it is, it's still the fastest inter-processor communication
system I know of. Which is why it's used for spinlocks.
(Do you know anything faster?)
Well, one thing a bit faster is doing the whole thing to write-through
memory. (Does Linux have any write-through space it uses for such things?)
And yes, I do measure the time it takes to do this.
The measured time from A->B is bus_time + skew.
The measured time from B->A is bus_time - skew.
Thus, bus_time = (abtime + batime)/2, and skew = (abtime - batime)/2).
Thats how I measure.
As long as bus_time remains consistent, it should be fine.
Is there a problem here?
-- -Colin- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/