Re: 4.13-rc7: WARNING at arch/x86/kvm/mmu.c:717 (and a crash thereafter)

From: Mike Galbraith
Date: Mon Aug 28 2017 - 09:53:18 EST


On Mon, 2017-08-28 at 14:26 +0200, Takashi Iwai wrote:
> Hi,
>
> I seem to get a kernel warning when running KVM on Dell desktop with
> IvyBridge like below. As you can see, a bad page BUG is triggered
> after that, too. The problem is not triggered always, but it happens
> occasionally.
>
> I haven't seen this on 4.13-rc4 at all, and IIRC, it started happening
> since rc5. So this might be a regression at rc5. But, as it doesn't
> happen always, I can't be 100% sure about it, and it's quite difficult
> to bisect (the test case isn't reliable), unfortunately.
>
> Any hint for further debugging this?

Maybe a way to make failure more likely. ÂThis is an RT kernel, but
trying to build a fat kernel over NFS from a KVM clone of my
workstation (full topology, half of ram) didn't survive one build.

[ 2583.153312] WARNING: CPU: 7 PID: 9323 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0x82/0x100 [kvm]
[ 2583.153899] WARNING: CPU: 7 PID: 9323 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0x82/0x100 [kvm]
[ 2583.154016] WARNING: CPU: 7 PID: 9323 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0x82/0x100 [kvm]
[ 2583.158810] WARNING: CPU: 7 PID: 9323 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0x82/0x100 [kvm]
[ 2768.419797] BUG: Bad page state in process as pfn:048b3
[ 2768.419932] BUG: Bad page state in process as pfn:04983
[ 2775.097980] BUG: Bad page state in process cc1 pfn:04982
[ 2782.487748] BUG: Bad page state in process cc1 pfn:04980
[ 2782.622636] BUG: Bad page state in process cc1 pfn:048b0
[ 2782.622899] BUG: Bad page state in process cc1 pfn:04984
[ 2782.623053] BUG: Bad page state in process cc1 pfn:04986
[ 2782.673705] BUG: Bad page state in process cc1 pfn:048b4
[ 2782.673903] BUG: Bad page state in process cc1 pfn:048b6
[ 2782.674044] BUG: Bad page state in process cc1 pfn:04989
[ 2782.674185] BUG: Bad page state in process cc1 pfn:0498a
[ 2784.895701] BUG: Bad page state in process cc1 pfn:04990
[ 2784.895921] BUG: Bad page state in process cc1 pfn:04992
[ 2784.896100] BUG: Bad page state in process cc1 pfn:04994
[ 2784.896255] BUG: Bad page state in process cc1 pfn:04996
[ 2784.905232] BUG: Bad page state in process cc1 pfn:0499c
[ 2784.905501] BUG: Bad page state in process cc1 pfn:0499e
[ 2785.762044] BUG: Bad page state in process cc1 pfn:040cb
[ 2787.052976] BUG: Bad page state in process cc1 pfn:048ca
[ 2787.208480] BUG: Bad page state in process kdesu pfn:048a8
[ 2787.208694] BUG: Bad page state in process kdesu pfn:048aa
[ 2787.208862] BUG: Bad page state in process kdesu pfn:048ac
[ 2787.208957] BUG: Bad page state in process kdesu pfn:048ae
[ 2787.211725] BUG: Bad page state in process cc1 pfn:04884
[ 2787.219784] BUG: Bad page state in process kdesu pfn:04888
[ 2787.226212] BUG: Bad page state in process cc1 pfn:049a0
[ 2788.955108] BUG: Bad page state in process cc1 pfn:048e9
[ 2788.959686] BUG: Bad page state in process cc1 pfn:048f1
[ 2788.959882] BUG: Bad page state in process cc1 pfn:048f2
[ 2788.977485] BUG: Bad page state in process cc1 pfn:048fe
[ 2789.295335] BUG: Bad page state in process cc1 pfn:04807
[ 2794.661501] BUG: Bad page state in process cc1 pfn:04819
[ 2794.661658] BUG: Bad page state in process cc1 pfn:0481b
[ 2794.661747] BUG: Bad page state in process cc1 pfn:0481d
[ 2794.680432] BUG: Bad page state in process cc1 pfn:0482a
[ 2794.692849] BUG: Bad page state in process cc1 pfn:04834
[ 2794.705438] BUG: Bad page state in process cc1 pfn:0483c
[ 2794.784882] BUG: Bad page state in process gcc pfn:0485c
[ 2794.785105] BUG: Bad page state in process gcc pfn:0485e
[ 2796.541058] BUG: Bad page state in process Xorg pfn:04011
[ 2808.425625] BUG: Bad page state in process Xorg pfn:04a09
[ 3605.187591] BUG: unable to handle kernel paging request at 000000000001bcf4
[ 3605.202446] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:33
[ 3605.203322] BUG: stack guard page was hit at ffffc9000e483ff8 (stack is ffffc9000e484000..ffffc9000e487fff)
[ 3605.279108] BUG: scheduling while atomic: ld/5485/0x00000002
>