Re: LOCK prefix on uni processor has its use (was Re: [BUG FIX] Make x86_32 uni-processor Atomic ops, Atomic)
From: Michael S. Zick
Date: Wed May 27 2009 - 13:10:45 EST
On Wed May 27 2009, Harald Welte wrote:
> Hi hpa and others,
> On Sat, May 23, 2009 at 04:44:08PM -0700, H. Peter Anvin wrote:
> > It looks like there might be a problem with the C7-M ... Michael reports
> > that if he sets LOCK_PREFIX to "lock;" it works, but that shouldn't be
> > necessary for a uniprocessor.
> It seems, they are neccessary.
> Here are some statements from the CPU logic guys at VIA/Centaur:
> * A read-modify-write sequence cannot be interupted.
> * All X86 instructions except rep-strings are atomic wrt interrupts.
> * The lock prefix has uses on a UP processor: It keeps DMA devices from
> interfering with a read-modify-write sequence
> Furthermore, they have done some experimentation in the past, making the
> CPU simply ignore the LOCK prefix on uni-processor (running a certain popular
> proprietary operating system): It doesn't work, presumably of the abovementioned
> DMA related conflict.
> Also, the engineers believe that it is only a matter of time until different
> CPU/chipset combination would expose the same bug. Since the in-order
> single-retire C7-M is more vulnerable than out-of-order, multiple-retire CPU's,
> they are not surprised that the issue shows first on the C7-M.
> The recommendation from the CPU engineers, unsurprisingly, thus is to put the
> LOCK prefixes back where they were.
> Hope this helps you.
> Now if I understand the issues correctly, it would mean that there is some
> driver code that modifies a certain chunk of memory, while DMA of some
> peripheral is also accessing that memory. I suppose it would not have to be
> the same actual address, but probably being within the same cache line is
> already sufficient.
I am also testing with the pci cache line size hard-coded to be the same size
as the processor cache line size (a WAFG for now) - -
It is too soon (only an 1 1/2 hours) to be a significant finding - -
but if this was set to twice the physical line length, it would be only
flushing every other line - which I think would show up *real* fast. ;)
I am noticing some "dropped buffers and/or dropped packets" in my streaming
music - - but that is not conclusive of anything other than hd-audio may
be using the wrong cache stride also. ;)
> Now the question is: Is this a valid operation of a driver? Should the driver
> do such things, or is such a driver broken? When would that occur? I'm trying
> to come up with a case, but typically you e.g. allocate some DMA buffer and
> then don't touch it until the hardware has processed it.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/