Re: spin_unlock optimization(i386)

Andrea Arcangeli (andrea@suse.de)
Sun, 28 Nov 1999 14:57:49 +0100 (CET)


On Sat, 27 Nov 1999, Gerard Roudier wrote:

>By the way, I am not that impressed about the result:
>
>1) Either we use a lock prefix and latency in case of contention is
> deterministic but the cost is about 20 cyles (the cache will be aware
> of the STORE and will raise HITM if another CPU wants the data).
>
>2) Or we have to cross fingers for the CPU to drain the STORE out of
> the write buffer fast enough, but the cost is just a couple of cycles.

(2) is not an option as it's not a performance issue.

The wait_event userspace simulation (from Manfred) shows that something
gets definitely reordered by my double-PII. The CPU that waits for the
event sees the writes in reversed order. It doesn't matter at all the
delay of the writes. The two writes on the event CPU can also happen at
the same time. Or both CPU (the `event' and the `wait' CPU) can try to
write to the same location at the same time and if there's contention of
the location it doesn't matter who wins as if there's contention it means
the event-CPU just wrote the other variable (where there can't be
contention as nobody else will write to it) and so our further read will
tell us to skip the loop in first place because will see the ancient write
of the event-CPU. The _only_ thing that matters is that the waiter must
see the changes in order as expected by reading the trivial assembler
supposing that the CPUs are not allowed to reoder/speculate anything. That
doesn't happen and we need a lock on the bus to prevent the wait-CPU to
speculate reads. So the current locking is necessary to prevent deadlocks
and we are not wasting time (thus we are not broken ;).

BTW, the lock on the bus after the two writes on the event-CPU is not
necessary to exploit the SMP race (the race happens even if nobody in
userspace runs a lock on the bus). I also noticed that putting the lock on
the bus _between_ the two writes on the event-CPU prevents the deadlock
too, but I believe that's only a side effect, I guess such a lock only
incidentally prevents the wait-CPU to speculate reads as well.

>Still joking :), if it is so it may work a lot better under NT than under

If NT really assumes that an IA32-SMP system won't reorder instructions it
can SMP race IMHO.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/