Nice story so far, but now you're claiming incorrect things.
There are a set of algorithms that are called "marching tests". The
simplest is "mats" and is commonly written as {up_W0, up_R0W1, updn_R1}
This test detects the simplest type of errors, and not much more.
This test does not find all coupling errors that you describe.
A more sofisticated test like marchg finds all errors, upto all the
single coupling errors that may exist.
MarchG is (cut-and-paste from a memory test program that I'm writing):
MARCH(UPDN,W0, TT,order,stride); \
MARCH(UP ,R0W1R1W0R0W1,TT,order,stride); \
MARCH(UP ,R1W0W1, TT,order,stride); \
MARCH(DOWN,R1W0W1W0, TT,order,stride); \
MARCH(DOWN,R0W1W0, TT,order,stride); \
DELAY; \
MARCH(UPDN,R0W1R1, TT,order,stride); \
DELAY; \
MARCH(UPDN,R1W0R0, TT,order,stride);
This is supposed to also find errors caused by the refresh not working
for some memory cells. That is what the "DELAY" things are for.
> This would take weeks to test a few megabytes of RAM!
>
> There is also something called pattern sensitivity. Lets say that you
> read RAM in 0x1000 byte blocks (a page on the ix86). Lets say the first page
> was filled with 0xffffffff and the next page was filled with 0x00000000.
> Suppose that you read these two pages over and over in a continuous loop
> and the loop takes 1 ms to execute. It takes a different amount of
> current from the power supply to access a bunch of 0xffffffffs than
> it does to access a bunch of 0x00000000s. This would put a 1 kHz load
> change on the power supply. What happends if the power supply has a
> 1 kHz overshoot?? The voltage will bounce at a 1 kHz rate. If it gets
> out of range during this modulation, RAM (any RAM anywhere) could lose
> its data.
Hey, this is indeed a possible scenario of why memory tests cannot
find some errors, while kernel compiles do.
> A 1000+ word paper could be written (probably has been) on testing
> RAM and the causes and cures of RAM errors.
There is a 300+ page book called "testing semiconductor memories" by
A.J. van de Goor that explains this reasonably well....
Roger.