Re: rowhammer protection [was Re: Getting interrupt every million cache misses]
From: Ingo Molnar
Date: Fri Oct 28 2016 - 04:59:48 EST
* Pavel Machek <pavel@xxxxxx> wrote:
> On Fri 2016-10-28 09:07:01, Ingo Molnar wrote:
> >
> > * Pavel Machek <pavel@xxxxxx> wrote:
> >
> > > +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
> > > +{
> > > + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
> > > + u64 now = ktime_get_mono_fast_ns();
> > > + s64 delta = now - *ts;
> > > +
> > > + *ts = now;
> > > +
> > > + /* FIXME msec per usec, reverse logic? */
> > > + if (delta < 64 * NSEC_PER_MSEC)
> > > + mdelay(56);
> > > +}
> >
> > I'd suggest making the absolute delay sysctl tunable, because 'wait 56 msecs' is
> > very magic, and do we know it 100% that 56 msecs is what is needed
> > everywhere?
>
> I agree this needs to be tunable (and with the other suggestions). But
> this is actually not the most important tunable: the detection
> threshold (rh_attr.sample_period) should be way more important.
>
> And yes, this will all need to be tunable, somehow. But lets verify
> that this works, first :-).
Yeah.
Btw., a 56 NMI delay is pretty brutal in terms of latencies - it might
result in a smoother system to detect 100,000 cache misses and do a
~5.6 msecs delay instead?
(Assuming the shorter threshold does not trigger too often, of course.)
With all the tunables and statistics it would be possible to enumerate how
frequently the protection mechanism kicks in during regular workloads.
Thanks,
Ingo