Re: kvm splat in mmu_spte_clear_track_bits

From: Bernhard Held
Date: Mon Aug 28 2017 - 11:26:25 EST


On 08/27/2017 at 02:35 PM, Adam Borowski wrote:
4.13-rc5 retested fails
Crashed only after two hours or so of testing.

4.13-rc4 apparently works
It survived several hours of varied tests (like 5 debian-installer runs, a
win10 point release upgrade, some hurd package building, openbsd, etc),
all while the host was likewise busy.

Thus: to the best of my knowledge, the problem is between 4.13-rc4 and 4.13-rc5
but I wouldn't bet my life on it.

I get crashes with Win10 in kvm with 4.13-rc5. 4.13-rc4 works for me. THP seems to accelerate the crash, but that's not 100% sure.

There's still no crash after reverting merge 27df70 on 4.13-rc7. There are 21 commits in this merge, 10 are mm-related:

$ git log 4e082e9ba7cd..e86b298bebf7 --pretty=oneline --abbrev-commit
e86b298bebf7 userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
f357e345eef7 zram: rework copy of compressor name in comp_algorithm_store()
aac2fea94f7a rmap: do not call mmu_notifier_invalidate_page() under ptl
d041353dc98a mm: fix list corruptions on shmem shrinklist
af54aed94bf3 mm/balloon_compaction.c: don't zero ballooned pages
c0a6a5ae6b5d MAINTAINERS: copy virtio on balloon_compaction.c
b3a81d0841a9 mm: fix KSM data corruption
99baac21e458 mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
0a2dd266dd6b mm: make tlb_flush_pending global
56236a59556c mm: refactor TLB gathering API
a9b802500ebb Revert "mm: numa: defer TLB flush for THP migration as long as possible"
0a2c40487f3e mm: migrate: fix barriers around tlb_flush_pending
16af97dc5a89 mm: migrate: prevent racy access to tlb_flush_pending
9eeb52ae712e fault-inject: fix wrong should_fail() decision in task context
4e98ebe5f435 test_kmod: fix small memory leak on filesystem tests
9c56771316ef test_kmod: fix the lock in register_test_dev_kmod()
434b06ae23ba test_kmod: fix bug which allows negative values on two config options
a4afe8cdec16 test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
5af10dfd0afc userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
75dddef32514 mm: ratelimit PFNs busy info message
d507e2ebd2c7 mm: fix global NR_SLAB_.*CLAIMABLE counter reads

Any hint on what to test first is welcome!

Bernhard