Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chain supportto use NMI-safe methods

From: Linus Torvalds
Date: Mon Jun 15 2009 - 13:39:23 EST




On Mon, 15 Jun 2009, Ingo Molnar wrote:
>
> A simple cr2 corruption would explain all those cc1 SIGSEGVs and
> other user-space crashes i saw, with sufficiently intense sampling -
> easily.

Note that we could work around the %cr2 issue, since any corruption is
always nicely "nested" (ie there are never any SMP issues with async
writes to the register).

So what we _could_ do is to have a magic value for %cr2, along with a "NMI
sequence count", and if we see that value, we just return (without doing
anything) from the page fault handler.

Then, the NMI handler would be changed to always write that value to %cr2
after it has done the operation that could fault, and do an atomic
increment of the NMI sequence count. Then, we can do something like this
in the page fault handler:

if (cr2 == MAGIC_CR2) {
static unsigned long my_seqno = -1;
if (my_seqno != nmi_seqno) {
my_seqno = nmi_seqno;
return;
}
}

where the whole (and only) point of that "seqno" is to protect against
user space doing something like

int i = *(int *)MAGIC_CR2;

and causing infinite faults.

If a real NMI happens, then nmi_seqno will always be different, and we'll
just retry the fault (the NMI handler would do something like

write_cr2(MAGIC_CR2);
atomic_inc(&nmi_seqno);

to set it all up).

Anyway, I do think that the _correct_ solution is to not do page faults
from within NMI's, but the above is an outline of how we could _try_ to
handle it if we really really wanted to. IOW, the fact that cr2 gets
corrupted is not insurmountable, exactly because we _could_ always just
retrigger the page fault, and thus "re-create' the corrupted %cr2 value.

Hacky, hacky. And I'm not sure how happy CPU's even are to have %cr2
written to, so we could hit CPU issues.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/