Re: [PATCH 3.17-rc4 v7 0/6] arm: Implement arch_trigger_all_cpu_backtrace

From: Daniel Thompson
Date: Thu Oct 16 2014 - 05:23:49 EST


On 14/10/14 23:37, Daniel Drake wrote:
> Hi,
>
> Thanks a lot for working on this!
>
> On Wed, Sep 17, 2014 at 10:10 AM, Daniel Thompson
> <daniel.thompson@xxxxxxxxxx> wrote:
>> Changes *before* v1:
>>
>> * This patchset is a hugely cut-down successor to "[PATCH v11 00/19]
>> arm: KGDB NMI/FIQ support". Thanks to Thomas Gleixner for suggesting
>> the new structure. For historic details see:
>> https://lkml.org/lkml/2014/9/2/227
>
> What's the right way to extend your work in order to get a NMI-like
> watchdog hard lockup detector similar to the one on x86?

There are a few things to get into place for this.

1. Figure out what number to put into the PMU to get an interrupt every
10s and provide the stub functions for the lock up detector.

2. Modify the current ARM PMU support to make is possible for this code
to run from a FIQ handler. This should be feasible by replicating
the design pattern used on x86. Nevertheless this is a fairly big
chunk of code review and testing.

3. Modify the Linux IRQ support to allow some kind of flag to
hint/demand that an interrupt be treated as NMI-ish in order to
switch (unshared) interrupts into FIQ mode and hook this up in
the GIC.

[Side note, this approach was suggested by Thomas Gleixner in
response to some rather hacky patches from me. My patches are
robust enough but are poorly designed and hard to maintain.
Thus if you want to do any quick prototyping you might skip this
step and dig out my old patches:

https://git.linaro.org/people/daniel.thompson/linux.git/shortlog/refs/heads/dev/kdb-fiq

Note also that, as a side effect of the above, tools like oprofile would
also get a very significant boost for kernel profiling because they
would no longer attribute time spent in interrupt handlers to interrupt
unmask functions.

At present I've done a little work towards all three of the above but
none are complete (most of the code has never been executed).


> I'm testing your patches on Exynos4412 and I guess in their current
> state they don't go quite this deep, as the only callers of
> trigger_all_cpu_backtrace() are sysrq, hung_task and spinlock debug
> code - none of which seem as fail-safe as a trigger like a
> pre-programmed watchdog NMI interrupt would be.
>
> Do I need to find a way to get CONFIG_FIQ available on this platform
> first? and/or CONFIG_HARDLOCKUP_DETECTOR?

You need CONFIG_FIQ working first. Be aware that this may be impossible
on Exynos unless you control the TrustZone. For this reason most of my
development is on Freescale i.MX6 (because i.MX6 boots in secure mode).


Daniel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/