Re: DRAM unreliable under specific access patern

From: Willy Tarreau
Date: Sun Dec 28 2014 - 04:18:27 EST


Hi Pavel,

On Wed, Dec 24, 2014 at 05:38:23PM +0100, Pavel Machek wrote:
> Hi!
>
> It seems that it is easy to induce DRAM bit errors by doing repeated
> reads from adjacent memory cells on common hw. Details are at
>
> https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf

Extremely interesting stuff. I've always wondered if such modules
were *that* reliable given how picky they are about all timings.

> . Older memory modules seem to work better, and ECC should detect
> this. Paper has inner loop that should trigger this.
>
> Workarounds seem to be at hardware level, and tricky, too.
>
> Does anyone have implementation of detector? Any ideas how to work
> around it in software?

Maybe reserve some memory "canary" that is periodically scanned and
observe changes there. That will not tell you for sure that something
has not been done, but it will tell you for sure that bits were flipped.

Also I'm wondering whether perf counters on certain CPUs could be used
to detect the abnormal number of clflushes or even the memory access
pattern (will not work in multi-socket environments if a user has one
dedicated CPU though).

Thanks for sharing the link!
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/