Re: [kernel-hardening] rowhammer protection [was Re: Getting interrupt every million cache misses]
From: Mark Rutland
Date: Mon Oct 31 2016 - 18:09:20 EST
On Mon, Oct 31, 2016 at 10:13:03PM +0100, Pavel Machek wrote:
> On Mon 2016-10-31 14:47:39, Mark Rutland wrote:
> > On Mon, Oct 31, 2016 at 09:27:05AM +0100, Pavel Machek wrote:
> > > > On Fri, Oct 28, 2016 at 01:21:36PM +0200, Pavel Machek wrote:
> > > > > > Has this been tested on a system vulnerable to rowhammer, and if so, was
> > > > > > it reliable in mitigating the issue?
> >
> > > > > I do not have vulnerable machine near me, so no "real" tests, but
> > > > > I'm pretty sure it will make the error no longer reproducible with the
> > > > > newer version. [Help welcome ;-)]
> > > >
> > > > Even if we hope this works, I think we have to be very careful with that
> > > > kind of assertion. Until we have data is to its efficacy, I don't think
> > > > we should claim that this is an effective mitigation.
> ...
> >
> > To be quite frank, this is anecdotal. It only shows one particular attack is
> > made slower (or perhaps defeated), and doesn't show that the mitigation is
> > reliable or generally applicable (to other machines or other variants of the
> > attack).
>
> So... I said that I'm pretty sure it will fix problem in my testing,
> then you say that I should be careful with my words, I confirm it was
> true, and now you complain that it is anecdotal?
Clearly I have chosen my words poorly here. I believe that this may help
against some attacks on some machines and workloads, and I believe your results
for your machine.
My main concern was that this appears to be described as a general solution, as
in the Kconfig text:
Enable rowhammer attack prevention. Will degrade system
performance under attack so much that attack should not
be feasible.
... yet there are a number of reasons why this may not be the case given varied
attack mechanisms (e.g. using non-cacheable mappings, movnt, etc), given some
hardware configurations (e.g. "large" SMP machines or where timing is
marginal), given some workloads may incidentally trip often enough to be
severely penalised, and given that performance counter support is sufficiently
varied (across architectures, CPU implementations, and even boards using the
same CPU if one considers things like interrupt routing).
Given that, I think that makes an overly-strong, and perhaps misleading claim
(i.e. people could turn the option on and believe that they are protected, when
they are not, leaving them worse off). It isn't really possible to fail
gracefully here, and even if this is suitable for some hardware, very few
people are in a position to determine whether their hardware falls in that
category.
Unfortunately, I do not believe that there is a simple and/or general software
mitigation.
> Would it be less confusing if we redefined task description from
> "prevent rowhammer" to "prevent more than X memory accesses in 64
> msec"?
Definitely. Quantifying exactly what you're trying to defend against (and
therefore what you are not) would help to address at least one of my concerns.
Thanks,
Mark.