On 12/01/2014 04:37 AM, Masami Hiramatsu wrote:
(2014/11/29 1:01), Steve Capper wrote:
On 27 November 2014 at 06:07, Masami Hiramatsu
<masami.hiramatsu.pt@xxxxxxxxxxx> wrote:
(2014/11/27 3:59), Steve Capper wrote:
The crash is extremely easy to reproduce.
I've not observed any missed events on a kprobe on an arm64 system
that's still alive.
My (limited!) understanding is that this suggests there could be a
problem with how missed events from a recursive call to memcpy are
being handled.
I think so too. BTW, could you bisect that? :)
I can't bisect, but the following functions look suspicious to me
(again I'm new to kprobes...):
kprobes_save_local_irqflag
kprobes_restore_local_irqflag
I think these are breaking somehow when nested (i.e. from a recursive probe).
Agreed. On x86, prev_kprobe has old_flags and saved_flags, this
at least must have saved_irqflag and save/restore it in
save/restore_previous_kprobe().
What about adding this?
struct prev_kprobe {
struct kprobe *kp;
unsigned int status;
+ unsigned long saved_irqflag;
};
and
static void __kprobes save_previous_kprobe(struct kprobe_ctlblk *kcb)
{
kcb->prev_kprobe.kp = kprobe_running();
kcb->prev_kprobe.status = kcb->kprobe_status;
+ kcb->prev_kprobe.saved_irqflag = kcb->saved_irqflag;
}
static void __kprobes restore_previous_kprobe(struct kprobe_ctlblk *kcb)
{
__this_cpu_write(current_kprobe, kcb->prev_kprobe.kp);
kcb->kprobe_status = kcb->prev_kprobe.status;
+ kcb->saved_irqflag = kcb->prev_kprobe.saved_irqflag;
}
I have noticed with the aarch64 kprobe patches and recent kernel I can get the machine to end up getting stuck and printing out endless strings of
[187694.855843] Unexpected kernel single-step exception at EL1
[187694.861385] Unexpected kernel single-step exception at EL1
[187694.866926] Unexpected kernel single-step exception at EL1
[187694.872467] Unexpected kernel single-step exception at EL1
[187694.878009] Unexpected kernel single-step exception at EL1
[187694.883550] Unexpected kernel single-step exception at EL1
I can reproduce this pretty easily on my machine with functioncallcount.stp from https://sourceware.org/systemtap/examples/profiling/functioncallcount.stp and the following steps:
# stap -p4 -k -m mm_probes -w functioncallcount.stp "*@mm/*.c" -c "sleep 1"
# staprun mm_probes.ko -c "sleep 1"
-Will
That would explain why the state of play of the interrupts is in an
unexpected state in the crash I reported:
"The point of failure in the panic was:
fs/buffer.c:1257
static inline void check_irqs_on(void)
{
#ifdef irqs_disabled
BUG_ON(irqs_disabled());
#endif
}
"
This is all new to me so I'm still at the head-scratching stage.
Ah, I see.
Thank you,
David,
Does the above make sense to you? Have you managed to reproduce the crash I get?
Cheers,
--
Steve