Intel has done some recent work to optimize LOCK#'s by allowing
processors to lock in the cache only, however, they still must snoop the
MBC for busmatering devices that may be DMA'ing into memory, etc., and
unfortunately, they have to support all the old stuff and handle cases
for self modifying code (why do you need to blow out the piplines every
time LOCK# is asserted? Intel does this on their processors becuase
people write self modifying code). You will also notice that every time
a GDT, LDT, PDE, or IDE is accessed with the "accessed" bit cleared,
Intel processors will invisibly assert LOCK# when they fetch any of
these processor descriptors from memory and rewrite the accessed bit (an
interrupt on Intel processors will actually trigger a read of the GDT,
PDE (if paging is on), IDE, and LDT (if present) tables, all of which
assert LOCK# to force atomic access of these tables -- this is one
reason latency with interrupts on Intel processors is higher than their
ompetition). All of the above mentioned happens transparently to the
software, and you need an analyzer to view and time these behaviors.
Intel has some nasty baggage they have to drag along behind them for
backwards compatibility .....
Jeff
David Wragg wrote:
>
> "Jeff V. Merkey" <jmerkey@timpanogas.com> writes:
> > On Intel machines, LOCK#'s are heavier than they
> > need to be becuase of all the issues Intel has with people writing
> > self-modifying code (about 60% of their errata deals with this problem).
>
> A couple of weeks ago, I went to a presentation about mutex
> optimisations. It included measurements of the cost of a mutex lock
> followed by unlock on Alpha, both with the usual
> load-locked/store-conditional with memory barrier sequence, and with
> the mutex code modified to use a load/store and no memory barriers
> sequence. I don't remember the exact figures, but:
>
> - The cost for the ll/sc version was >200 cycles on all of the EV4,
> EV5, and EV6. The cost for the l/s version was <50 cycles on those
> processors.
>
> - The l/s version took less cycles on each successive generation of
> processor, as you might expect. But the ll/sc version took more cycles
> on EV6 than on EV5.
>
> So compared to the competition, it doesn't seem that Intel is doing
> too badly with its LOCK performance.
>
> David Wragg
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/