Re: Query: ARM64: Behavior of el1_dbg exception while executing el0_dbg

From: Will Deacon
Date: Mon Jan 19 2015 - 05:12:01 EST


On Mon, Jan 19, 2015 at 06:10:08AM +0000, Pratyush Anand wrote:
> On Friday 16 January 2015 09:52 PM, Will Deacon wrote:
> > Perhaps you're not removing the BRK instruction properly, and so you try to
> > single-step a trapping instruction and end up stepping into the exception?
>
> No, probably that is not the scenario. One thing I agree, that even if
> AARCH64 specs says that SW BRK exception can not be masked, current
> kernel code is not ready to handle re-entrant software debug exception.
> So, I will keep those part of uprobe code as non-kprobable, and then its
> not so important to get into it for code development perspective.
>
> However, it would be good to understand that what went wrong and caused
> to receive an el1_inval. I still fail to pin point the reason of current
> issue and its not single stepping a trapping instruction (BRK). Sorry,
> but please have a relook at the sequence of events:

I think my general point still stands (the issue is likely in step 5),
but ok.

> 1. 1st instruction of uprobe_breakpoint_handler is:
> ffffffc00059a628: a9bf7bfd stp x29, x30, [sp,#-16]!
> which is replaced by BRK64_OPCODE_KPROBES = 0xD4200080, when Kprobe is
> instrumented.
>
> 2. User instruction at address 0x4005d0 is replaced by
> BRK64_OPCODE_UPROBES = 0xD4200100, when uprobe is instrumented.
>
> 3. When application executes instruction at 0x4005d0,we receive el0_dbg.
>
> 4. In el0_dbg handler we execute kernel code at address
> ffffffc00059a628, so el1_dbg is raised. (I agree here that el0_dbg has
> not been closed properly, which current entry.S code expects, so we will
> need to fix it if we consensus to support re-entrant software debug
> exception, how ever the issue which I see seems unrelated, so...)

Up to here, we seem to be doing fine.

> 5. Now in el1_dbg, we handle kprobe_breakpoint_handler, where we write
> saved instruction (ie a9bf7bfd stp x29, x30, [sp,#-16]!) to
> the kmalloc allocated address fffffdfffc000004. kprobe code does
> flush_icache_range on this location. regs->pc is set to
> fffffdfffc000004, so elr_el1 is programmed with fffffdfffc000004 during
> kernel_exit. I have cross checked elr_el1 value just before eret is
> executed in kernel_exit, and it is correct.

This is the step I'm concerned about. Can you verify that:

- Replacing the instruction with a nop does/doesn't change behaviour?
- 0xfffffdfffc000004 is mapped at the point of exception return?
- Using __flush_icache_all instead of flush_icache_range makes no
difference?

> So, here we are trying to single step a STP instruction and not BRK
> instruction.
>
> 6. Here I am expecting a single step exception, but I receive a el1_inv
> with ESR_EL1(0x86000007) ie EC as "ESR_EL1_EC_IABT_EL1" and IFSC as
> "Translation fault, third level". WHY????

That likely means that 0xfffffdfffc000004 isn't mapped. Looking at the
kprobes code, shouldn't it be using the modules area so that it can
guarantee an executable mapping? If so, that should be below PAGE_OFFSET
which isn't true in your case afaict.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/