Re: [PATCH -tip] kgdb, x86: Pull up NMI notifier handler priority

From: Jason Wessel
Date: Thu Mar 31 2011 - 13:41:14 EST


On 03/24/2011 12:24 AM, Cyrill Gorcunov wrote:
> If Jason is ok with such splitting -- I dont mind either ;)
>
> On Thursday, March 24, 2011, Dongdong Deng <libfetion@xxxxxxxxx> wrote:
>> On Thu, Mar 24, 2011 at 4:32 AM, Cyrill Gorcunov <gorcunov@xxxxxxxxxx> wrote:
>>> kgdb needs IPI to be sent and handled before perf
>>> or anything else NMI, otherwise kgdb hangs with bootup
>>> self-tests (found on P4 HT SMP machine). Raise its priority
>>> so that we're called first in a notifier chain.
>>>

I talked with Cyrill outside the mailing list since he pinged me and I will summarize here.

My initial thought about the patch Deng Dongdong posted was that it was really ugly to have kgdb registered in the notifier chain twice. I would be willing to live with this for now if we agree that when jump labels are merged to the kernel that we can make use of that instead.

The jump labels would allow us to invoke the debugger directly when the debugger is active much like we do when CONFIG_KGDB_LOW_LEVEL_TRAP is set. In fact the code that is ifdef'ed with CONFIG_KGDB_LOW_LEVEL_TRAP can make use of the same jump label as the NMI entry and no longer be #ifdef'ed when jump labels come to pass.

The very discussion of the patch raised the question of "why not always have the debugger be first?" The answer for that lies in that some code needs to run before the debugger to keep the system running assuming you are planning on restarting it after entering the debugger. The generic die notifier is used for lots of circumstances and the priority the debugger cares about only matter for a select few exception types.

The kmmio, mce-inject, and crash_nmi_nb (from reboot.c) are good examples of in tree code that should run with a higher priority than the debugger because the debugger doesn't know what to do with these code paths, so it sits last in line hoping someone else will deal with the exception else enter the debugger. For the trap paths the debugger needs to be first in line to deal with the case where where a breakpoint is in a notifier to avoid non-recoverable recursive faults. For NMI it appears we need to run before the perf code or the perf code will eat an nmi event intended for kgdb and result in a dead locked system.

The net result. I'll sign-off on the kgdb change and add a TODO item to wait for the jump patching to enter the kernel.

Cyrill, I am assuming this is something we want to aim to merge into the 2.6.39 as a regression fix? I'll try to get a version of Deng Dongdong's patch into linux-next as soon as possible in the mean time.

Cheers,
Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/