Re: 4.14.9 with CONFIG_MCORE2 fails to boot

From: Dave Hansen
Date: Fri Dec 29 2017 - 12:32:34 EST


Does anyone have the results of build that they can share? (vmlinux,
vmlinuz/bzImage, System.map, .config). That, plus a corresponding
serial log with an oops would be helpful.

I tried just adding MCORE2=y to my normal config but it didn't reproduce
this.

If you can't send the entire build like that, just running scripts/
faddr2line on __schedule+0x37f/0x7b0 would be very enlightening.

On 12/29/2017 06:41 AM, Alexander Tsoy wrote:
> [ 0.775461] NMI backtrace for cpu 0
> [ 0.775461] CPU: 0 PID: 114 Comm: modprobe Not tainted 4.1u.0-rc5+
...
> [ÂÂÂÂ0.775461] Call Trace:
> [ÂÂÂÂ0.775461]ÂÂ<#DF>
> [ÂÂÂÂ0.775461]ÂÂ? double_fault+0xc/0x30
> [ÂÂÂÂ0.775461]ÂÂ? page_fault+0x36/0x60
> [ÂÂÂÂ0.775461]ÂÂdo_double_fault+0xb/0x130
> [ÂÂÂÂ0.775461]ÂÂ</#DF>
> [ÂÂÂÂ0.775461] Code: 78 4c 89 7c 24 08 4c 89 74 24 10 4c 89 6c 24 18 4c
> 89 64 2t 20 48 89 6c 24 28 48 89 5c 24 30 bb 01 00 00 00 b9 01 01 00 c0
> 0f 32 <85> d2 78 05 0f 01 f8 31 db c3 0f 1f 40 00 66 2e 0f 1f 8t 00 00Â

>From the various oopses, it looks like this happens when getting a
double fault while trying to go idle. The CPU gets is probably trying
to return from the double fault, but it didn't do anything useful in the
fault handler so it just continues faulting, but the NMI watchdog can
still get an oops out of it.

It doesn't appear to be a recursing *too* far because it's not blowing
through the stack and triple faulting.

Of the several traces, they all appear to be in paths that might call
safe_halt() (including the kvm async page fault code). It makes me
wonder if we've been taking double faults there for a long time, but the
new trampoline stack somehow ends up being more fragile and can't
recover from the double-fault.

Couple more things:

MCORE2 seems to get one oddball compiler flag (-march=core2):

> cflags-$(CONFIG_MCORE2) += \
> $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))

It would be interesting to see if replacing the above "$(call" with:

$(call cc-option,-mtune=generic)

makes the problem go away the same way as changing the .config option.

The MCORE2 config option also sets CONFIG_X86_P6_NOP, which overrides
the normal X86_64 noops, if I'm reading that code correctly. But I
think that's much less likely to be the since there