Re: [BUG ARM64/perf] Perf record on hardware breakpoint causes application to hang

From: Will Deacon
Date: Thu Mar 03 2016 - 10:01:18 EST


On Thu, Mar 03, 2016 at 09:09:05PM +0800, Hekuang wrote:
> This problem can be reproduced as follows:
>
> We know cat /proc/version will read the memory of symbol
> linux_proc_banner, then we make a hardware memory access
> breakpoint on that address.
>
> on terminal 1:
>
> $ perf record -e mem:0x$(cat /proc/kallsyms|grep linux_proc_banner|cut -d
> " " -f 1):rw --no-buffer -a
>
> on terminal 2:
>
> $ cat /proc/version
>
> Then our 'cat' process on terminal 2 will be hanged, until we press
> '^C' to stop perf from recording events.
>
> The sample numbers recorded by perf is extraordinary too:
>
> [ perf record: Captured and wrote 0.879 MB perf.data (22691 samples) ]
>
> The right result can be produced by removing the 'no-buffer'
> argument in perf command line, and the result should be like
> this:
>
> $ perf record -e mem:0x$(cat /proc/kallsyms|grep linux_proc_
> banner|cut -d " " -f 1):rw -a
> ^C
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.013 MB perf.data (10 samples) ]
>
> Report this bug to you and hope for answers.

This sounds like a kernel-space equivalent to the issue reported here:

http://lkml.kernel.org/r/569CCEDA.6040103@xxxxxxxxxx

The problem is that we configure a single-step to step the watchpoint
and then re-arm it on completion, but because you have buffering disabled,
we *always* step into an interrupt thanks to the irq work that is queued
by perf to unblock the event fd being polled. We then re-arm the watchpoint
and take it immediately on return from the irq handler. Rinse, repeat.

We could consider re-enabling interrupts briefly on the debug exception
return path, but then we open ourselves up to black spots in the kernel
that cannot be debugged.

Will