Jeff V. Merkey wrote:
> To cite a Linux specific example, let's take the issue with the memory
> write for a spin_unlock(). Linus seemed to have trouble grasping why
> a simple ' mov <addr>, 0' would work as opposed to a 'lock dec
> <addr.>'
No logic analyser will tell you the subtleties of _why_ it works.
You'll see MESI working, you'll see processor ordering in your test
cases, but that doesn't tell you whether processor ordering works.
> Anyone who has ever spent late nights with an American Arium Analyzer
> profiling memory bus transactions on a PPro knows that MESI [...] will
> correctly propogate via the processor caches a write to a locked
> location with a correct load and stor oder without any problems of
> locking concurrency.
Wrong. They observe no locking problems with their particular test
cases. The logic analyser doesn't tell you that no code sequence will
exhibit a locking problem. It also doesn't mean that no future
processor will exhibit the problem.
Instead of using a bus analyzer to see that there's no _symptom_, the
kernel developers looked at Intel's specifications. A guy from Intel
helped with that. Eventually it was confirmed that Intel does actually
guarantee 'movb' works for spin-unlock.
At the same time, a few folks ran tests on a number of processors to see
if the ordering specifications were really followed. A lot of
misunderstanding and confusion did result from that.
Some tests failed, but they were actually the wrong tests for
spin-unlock, which is ok with 'movb'. They were the right tests for
some other subtle ordering problems though.
In the process, many of us learned a lot about x86 read-write ordering
rules. Through this, other bugs were found. See __set_task_state for
example.
If someone had just use the logic analyser, we'd never have constructed
the wrong threading tests, and we probably wouldn't have spotted the
task_state bug.
> Linus' apparently did not understand this, or he would have
> immediately realized that double locking was always generating a
> second non-cacheable memory reference for every lock being taken and
> released.
Erm, I think we _all_ knew about the second memory reference...
But non-cacheable? On a PPro lock operations are cacheable.
> The person writing and updating page table entries in NetWare 4.1 was
> clearing the accessed bit in the PTE and did not know that the
> processor would assert a hidden R/M/W operation and assert a bus lock
> to set this bit everytime someone cleared it -- it made performance
> drop 4% from NetWare 3.X and noone knew why. This performance problem
> would have never been found without this tool. 2 years of code
> reviews did not find it -- an American Ariun Analyser with a kernel
> debugger to stop and start and instrument the code with writing custom
> stubs all over the place did.
The kernel developers have known about those R/M/W "hidden cycles"
forever. See any standard Pentium textbook (or even 386/486 for that
matter).
Heck, even _I_ know this stuff and I've never programmed any page table
code, just read those parts of the kernel.
> Folks who just rely on code reviews never see this level of
> interaction, and conversely, do not have the understanding of hardware
> behavior underneath an OS to optimize it well.
Apparently your engineers didn't read the textbooks.
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:14 EST