as far as i understand it (reading only this thread, no Intel docs),
under Processor Order, CPU B sees stores made by CPU A in order, but
possibly delayed. I was wondering if this may cause unnecessary spins
on a CPU which is spinlocking, "waiting" to acquire the lock: it is
possible that CPU A releases the lock with a store, and CPU B doesn't
see it for a while? Note that this is not an ordering issue:
from one of Mr. Boleyn messages i understand that if CPU B acquires
the lock seeing the store in the unlock() made by CPU A, it must also
see all stores previously made by A in the critical section: in other
words, they are committed from B's point of view. The problem is not
that A critical section becomes longer (which may happen, and it's
harmless, a Linus clearly stated) but not letting B section become
shorter (with some of its reads executed *before* entering it). And that
won't happen, given Processor Order.
So, my question: can reads made by CPU B on the lock variable see an old
value (=busy) even after the unlock() made by A, for a while? And if yes,
how big is this interval (after all, B is really wasting cycles...)?
If not, does it mean that the serializing lock read made by CPU B
while spinlocking force a global (=on all CPUs) commit of any pending
store (thus making all the system work as WT for all the time B spinlocks)?
So, either 1) we're wasting cycles on CPU B spinning on a *free* lock (for
a while), or 2) the whole system performs as WT (or even worst) while B
is spinning. Am i missing something?
Sorry for the long message, and the waste of bandwith...
TiA,
.TM.
P.S.
BTW,
1) could be solved forcing a commit on CPU A right after (as part of) s
the unlock() (but, is it possible? were we doing precisely that, before
the whole thread starts?);
2) could be solved having B spinning inside two loops, the inner without
any lock, for a suitble number N of cycles, the outer which does the lock();
N has to be computed carefully. This really matters when the number of
CPUs is high, say 16, i think. Assuming A acquired the lock, and B is waiting
for it, the other 14 CPUs will have their activity serialized by B
continuously issuing lock() (BTW, is it true?).
-- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/