Re: [PATCHv2 2/2] tracing,mm - add kernel pagefault tracepoint forx86 & x86_64
From: Frederic Weisbecker
Date: Mon Nov 15 2010 - 08:43:43 EST
On Thu, Nov 11, 2010 at 10:09:09AM +0100, Jiri Olsa wrote:
> This provides a tracepoint to trace kernel pagefault event.
>
> When analyzing a vmcore resulting from a kernel failure, we
> often hypothesize that "there should have a pagefault event
> just before this instruction" or similar. Sometimes it means
> that there should have a small delay between instructions that
> extends a critical session and exposed a missing lock. Since
> there have been no evidence of kernel pagefault, it is quite
> difficult to adopt the hypothesis.
>
> If we can trace the kernel pagefault event, it will help narrow
> the possible cause of failure and will accelerate the
> investigation a lot.
>
>
> Signed-off-by: Larry Woodman <lwoodman@xxxxxxxxxx>
> Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> ---
> arch/x86/mm/fault.c | 33 ++++++++++++++++++++++-----------
> include/trace/events/kmem.h | 23 +++++++++++++++++++++++
> 2 files changed, 45 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 7d90ceb..171dcc9 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -12,6 +12,7 @@
> #include <linux/mmiotrace.h> /* kmmio_handler, ... */
> #include <linux/perf_event.h> /* perf_sw_event */
> #include <linux/hugetlb.h> /* hstate_index_to_shift */
> +#include <trace/events/kmem.h>
>
> #include <asm/traps.h> /* dotraplinkage, ... */
> #include <asm/pgalloc.h> /* pgd_*(), ... */
> @@ -944,17 +945,11 @@ static int fault_in_kernel_space(unsigned long address)
> return address >= TASK_SIZE_MAX;
> }
>
> -/*
> - * This routine handles page faults. It determines the address,
> - * and the problem, and then passes it off to one of the appropriate
> - * routines.
> - */
> -dotraplinkage void __kprobes
> -do_page_fault(struct pt_regs *regs, unsigned long error_code)
> +static inline void __do_page_fault(struct pt_regs *regs, unsigned long address,
> + unsigned long error_code)
> {
> struct vm_area_struct *vma;
> struct task_struct *tsk;
> - unsigned long address;
> struct mm_struct *mm;
> int fault;
> int write = error_code & PF_WRITE;
> @@ -964,9 +959,6 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
> tsk = current;
> mm = tsk->mm;
>
> - /* Get the faulting address: */
> - address = read_cr2();
> -
> /*
> * Detect and handle instructions that would cause a page fault for
> * both a tracked kernel page and a userspace page.
> @@ -1158,3 +1150,22 @@ good_area:
>
> up_read(&mm->mmap_sem);
> }
> +
> +/*
> + * This routine handles page faults. It determines the address,
> + * and the problem, and then passes it off to one of the appropriate
> + * routines.
> + */
> +dotraplinkage void __kprobes
> +do_page_fault(struct pt_regs *regs, unsigned long error_code)
> +{
> + unsigned long address;
> +
> + /* Get the faulting address: */
> + address = read_cr2();
> +
> + __do_page_fault(regs, address, error_code);
> +
> + if (!user_mode(regs))
> + trace_mm_kernel_pagefault(current, address, error_code);
> +}
I (and others) have been testing your patch to measure the latencies of page
faults.
So I have several comments about it.
First, we don't want a pointer to current, we can already retrieve the pid
from a trace.
Second, it would be definetly interesting to also have the instruction address
that faulted (regs->ip).
Three, I wonder why this tracepoint only traces kernel faults. And in fact
kernel faults is a confusing name. Users will be confused whether this is
about tracing only faults happening in kernel or also faults happening in
kernel data.
Actually I don't see any reason right now to trace only kernel faults. Do you?
If that's needed, one can still check on post-processing that the address
was in the kernel.
And four, measuring page fault handling duration can be desired, it would be
nice to have a page_fault_start, page_fault_end.
So in the end we can get:
dotraplinkage void __kprobes
do_page_fault(struct pt_regs *regs, unsigned long error_code)
{
unsigned long address;
/* Get the faulting address: */
address = read_cr2();
trace_mm_pagefault_start(address, error_code);
__do_page_fault(regs, address, error_code);
trace_mm_pagefault_end(address);
}
Would you be ok with that?
Last thing I worry about is that error_code that is very arch dependent.
If someone writes a script that depends on the x86 code, it won't work
elsewhere while it's fairly possible to have a generic tracepoint there.
So perhaps we rather need a generic enum field instead of the error_code,
to express the most interesting specific fault attributes. Than can
probably be added later though, once someone really needs it.
Hm?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/