Re: NMI vs #PF clash

From: Mathieu Desnoyers
Date: Tue May 22 2012 - 11:22:57 EST


* Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> On Tue, 2012-05-22 at 17:37 +0300, Avi Kivity wrote:
> > >
> > >
> > > Is reading it fast? Then we could do a two reads and only write when
> > > needed.
> >
> > The upside is 70 cycles on one machine, see d3edefc0035669.
>
> Thanks
>
> >
> >
> > >
> > > Something like this pseudo assembly
> > >
> > > mov cr2, rax
> > > push rax
> > >
> > > call do_nmi
> > >
> > > pop rax
> > > mov cr2, rbx
> > > cmp rax, rbx
> > > be skip
> > > mov rax, cr2
> > > skip:
> > >
> >
> >
> > Yes, provided no exceptions can happen at those points.
>
> Yes, exceptions can only happen in the do_nmi area. There should not be
> any breakpoints or page faults in the assembly code of the NMI handler.
>
> Now another NMI may come in at any point here, but it will detect that
> it is nested and return without doing anything (but telling this NMI to
> repeat itself).

That should take care of cr2. Those are faraway memories, but I think we
should be careful about pdg_offset too. If we look at x86-64
vmalloc_fault(), we notice that it uses the current task struct:

WARN_ON_ONCE(in_nmi()); <--- we should take that as a hint ;)

/*
* Copy kernel mappings over when needed. This can also
* happen within a race in page table update. In the later
* case just flush:
*/
pgd = pgd_offset(current->active_mm, address);

x86-32 does not have this problem, since it reads the cr3 register to
get the pgd_addr.

x86-64 using the current task can be an issue if the NMI nests over the
scheduler execution.

A few years ago, I posted this patch
http://www.gossamer-threads.com/lists/linux/kernel/1249694?do=post_view_threaded
that tried to fix this by reading cr3 on x86_64. However, after reports
that it caused some x86_64 machines to fail to boot, I removed this
patch from the LTTng patchset. So there was certainly something I missed
back then.

Just food for thoughts,

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/