On Wed, Apr 03, 2024, David Hildenbrand wrote:
On 03.04.24 02:17, Sean Christopherson wrote:
On Tue, Apr 02, 2024, David Hildenbrand wrote:
Aha! But try_to_unmap_one() also checks that refcount==mapcount+1, i.e. will
also keep the folio if it has been GUP'd. And __remove_mapping() explicitly states
that it needs to play nice with a GUP'd page being marked dirty before the
reference is dropped.
* Must be careful with the order of the tests. When someone has
* a ref to the folio, it may be possible that they dirty it then
* drop the reference. So if the dirty flag is tested before the
* refcount here, then the following race may occur:
So while it's totally possible for KVM to get a W=1,D=0 PTE, if I'm reading the
code correctly it's safe/legal so long as KVM either (a) marks the folio dirty
while holding a reference or (b) marks the folio dirty before returning from its
mmu_notifier_invalidate_range_start() hook, *AND* obviously if KVM drops its
mappings in response to mmu_notifier_invalidate_range_start().
Yes, I agree that it should work in the context of vmscan. But (b) is
certainly a bit harder to swallow than "ordinary" (a) :)
Heh, all the more reason to switch KVM x86 from (b) => (a).
As raised, if having a writable SPTE would imply having a writable+dirty
PTE, then KVM MMU code wouldn't have to worry about syncing any dirty bits
ever back to core-mm, so patch #2 would not be required. ... well, it would
be replaces by an MMU notifier that notifies about clearing the PTE dirty
bit :)
Hmm, we essentially already have an mmu_notifier today, since secondary MMUs need
to be invalidated before consuming dirty status. Isn't the end result essentially
a sane FOLL_TOUCH?