Re: [PATCH 1/1] Fix HARD Lockup Firing off while in debugger
From: Jeff Merkey
Date: Mon Dec 14 2015 - 21:59:23 EST
On 12/14/15, Don Zickus <dzickus@xxxxxxxxxx> wrote:
> On Sat, Dec 12, 2015 at 02:08:13PM -0700, Jeff Merkey wrote:
>> The current touch_nmi_watchdog() function in /kernel/watchdog.c does
>> not always catch all cases when a processor is spinning in the nmi
>> handler inside either KGDB, KDB, or MDB. The hrtimer_interrupts_saved
>> count can still end up matching the previous value in some cases,
>> resulting in the hard lockup detector tagging processors inside a
>
> Hi Jeff,
>
> I am confused here, the 'touch_nmi_watchdog()' was supposed to block the
> check for hrtimer_interrupts from happening. So if the check is still
> being
> executed _after_ you executed touch_nmi_watchdog(), it would imply there
> was
> about 10 seconds or so of time elapse from the touch command to the hrtimer
> check.
>
> So I am not sure how the below patch would fix this, other than just add
> another 10 second delay (for a total of 20 seconds) to your timeout?
>
>
>> debugger and executing a panic. The patch below corrects this
>> problem. I did not add this to the touch_nmi_function directly
>> becuase of possible affects on timing issues.
>>
>> I have tested this patch and it fixes the problem for kernel debuggers
>> stopping errant hard lockup events when processors are spinning inside
>> the debugger.
>
> The kernel doesn't normal take patches like this without a corresponding
> user, which I didn't see attached in this patch or a patch series.
>
> Cheers,
> Don
>
I'll resend the patch series properly formatted and clean. There is
a hole in there somewhere that causes this bug. You can reproduce it
by downloading the mdb debugger, patching linux, building it, then
removing the call to this function while spinning in the debugger with
a breakpoint on schedule() set from the debugger console. It does
fire off in about 20 seconds without this function I have suggested.
You can download the debugger here.
https://github.com/jeffmerkey/linux-stable/compare/v4.3.2...jeffmerkey:mdb-v4.3.2.diff
Use this patch applied to kernel v4.3.2 if you want to easily
reproduce it and before you build it remove the function call to
touch_hardlockup_watchdog() at mdb_watchdogs() in
arch/x86/kernel/debug/mdb/mdb-main.c.
I'll format another patch this time a clean one. I apologize.
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/