Re: kvm splat in mmu_spte_clear_track_bits
From: Takashi Iwai
Date: Mon Aug 28 2017 - 12:01:11 EST
On Mon, 28 Aug 2017 17:26:05 +0200,
Bernhard Held wrote:
>
> On 08/27/2017 at 02:35 PM, Adam Borowski wrote:
> > 4.13-rc5 retested fails
> > Crashed only after two hours or so of testing.
> >
> > 4.13-rc4 apparently works
> > It survived several hours of varied tests (like 5 debian-installer runs, a
> > win10 point release upgrade, some hurd package building, openbsd, etc),
> > all while the host was likewise busy.
> >
> > Thus: to the best of my knowledge, the problem is between 4.13-rc4 and 4.13-rc5
> > but I wouldn't bet my life on it.
>
> I get crashes with Win10 in kvm with 4.13-rc5. 4.13-rc4 works for me. THP seems to accelerate the crash, but that's not 100% sure.
>
> There's still no crash after reverting merge 27df70 on 4.13-rc7. There are 21 commits in this merge, 10 are mm-related:
>
> $ git log 4e082e9ba7cd..e86b298bebf7 --pretty=oneline --abbrev-commit
> e86b298bebf7 userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage
> f357e345eef7 zram: rework copy of compressor name in comp_algorithm_store()
> aac2fea94f7a rmap: do not call mmu_notifier_invalidate_page() under ptl
> d041353dc98a mm: fix list corruptions on shmem shrinklist
> af54aed94bf3 mm/balloon_compaction.c: don't zero ballooned pages
> c0a6a5ae6b5d MAINTAINERS: copy virtio on balloon_compaction.c
> b3a81d0841a9 mm: fix KSM data corruption
> 99baac21e458 mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
> 0a2dd266dd6b mm: make tlb_flush_pending global
> 56236a59556c mm: refactor TLB gathering API
> a9b802500ebb Revert "mm: numa: defer TLB flush for THP migration as long as possible"
> 0a2c40487f3e mm: migrate: fix barriers around tlb_flush_pending
> 16af97dc5a89 mm: migrate: prevent racy access to tlb_flush_pending
> 9eeb52ae712e fault-inject: fix wrong should_fail() decision in task context
> 4e98ebe5f435 test_kmod: fix small memory leak on filesystem tests
> 9c56771316ef test_kmod: fix the lock in register_test_dev_kmod()
> 434b06ae23ba test_kmod: fix bug which allows negative values on two config options
> a4afe8cdec16 test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"
> 5af10dfd0afc userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
> 75dddef32514 mm: ratelimit PFNs busy info message
> d507e2ebd2c7 mm: fix global NR_SLAB_.*CLAIMABLE counter reads
>
> Any hint on what to test first is welcome!
Did you get the crash reliably?
I've been struggling how to trigger it efficiently, but currently in
vain. The memory pressure isn't a single key to trigger it, as it
seems...
thanks,
Takashi