Re: [tip:x86/urgent] x86, trace: Fix CR2 corruption when tracing page faults

From: Peter Zijlstra
Date: Wed Mar 05 2014 - 08:08:16 EST


On Wed, Mar 05, 2014 at 02:02:24PM +0100, Peter Zijlstra wrote:
> > It also puts trace_page_fault_entries() and trace_do_page_fault() under
> > CONFIG_TRACING. I could only find the entry_32.S user; I suppose the
> > 64bit one is hidden by CPP goo somewhere?
>
> > +trace_errorentry page_fault normal_do_page_fault
>
> ah found it; its in there. That also means the normal_do_page_fault()
> thing won't actually compile proper.
>
> Lemme do one with that.

---
arch/x86/mm/fault.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e7fa28bf3262..a10c8c792161 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1020,8 +1020,12 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
* This routine handles page faults. It determines the address,
* and the problem, and then passes it off to one of the appropriate
* routines.
+ *
+ * This function must have noinline because both callers
+ * {,trace_}do_page_fault() have notrace on. Having this an actual function
+ * guarantees there's a function trace entry.
*/
-static void __kprobes
+static void __kprobes noinline
__do_page_fault(struct pt_regs *regs, unsigned long error_code,
unsigned long address)
{
@@ -1245,31 +1249,38 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
up_read(&mm->mmap_sem);
}

-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
+ unsigned long address = read_cr2(); /* Get the faulting address */
enum ctx_state prev_state;
- /* Get the faulting address: */
- unsigned long address = read_cr2();
+
+ /*
+ * We must have this function tagged with __kprobes, notrace and call
+ * read_cr2() before calling anything else. To avoid calling any kind
+ * of tracing machinery before we've observed the CR2 value.
+ *
+ * exception_{enter,exit}() contain all sorts of tracepoints.
+ */

prev_state = exception_enter();
__do_page_fault(regs, error_code, address);
exception_exit(prev_state);
}

-static void trace_page_fault_entries(struct pt_regs *regs,
+#ifdef CONFIG_TRACING
+static void trace_page_fault_entries(unsigned long address, struct pt_regs *regs,
unsigned long error_code)
{
if (user_mode(regs))
- trace_page_fault_user(read_cr2(), regs, error_code);
+ trace_page_fault_user(address, regs, error_code);
else
- trace_page_fault_kernel(read_cr2(), regs, error_code);
+ trace_page_fault_kernel(address, regs, error_code);
}

-dotraplinkage void __kprobes
+dotraplinkage void __kprobes notrace
trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
- enum ctx_state prev_state;
/*
* The exception_enter and tracepoint processing could
* trigger another page faults (user space callchain
@@ -1277,9 +1288,11 @@ trace_do_page_fault(struct pt_regs *regs, unsigned long error_code)
* the faulting address now.
*/
unsigned long address = read_cr2();
+ enum ctx_state prev_state;

prev_state = exception_enter();
- trace_page_fault_entries(regs, error_code);
+ trace_page_fault_entries(address, regs, error_code);
__do_page_fault(regs, error_code, address);
exception_exit(prev_state);
}
+#endif /* CONFIG_TRACING */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/