Re: [PATCH v2 1/3] kvm: svm: Add support for additional SVM NPF error codes

From: Brijesh Singh
Date: Tue Aug 01 2017 - 09:36:36 EST

On 07/31/2017 03:05 PM, Paolo Bonzini wrote:

There can be different cases where an L0->L2 shadow nested page table is
marked read only, in particular when a page is read only in L1's nested
page tables. If such a page is accessed by L2 while walking page tables
it will cause a nested page fault (page table walks are write accesses).
However, after kvm_mmu_unprotect_page you will get another page fault,
and again in an endless stream.

Instead, emulation would have caused a nested page fault vmexit, I think.

If possible could you please give me some pointer on how to create this use
case so that we can get definitive answer.

Looking at the code path is giving me indication that the new code
(the kvm_mmu_unprotect_page call) only happens if vcpu->arch.mmu_page_fault()
returns an indication that the instruction should be emulated. I would not
expect that to be the case scenario you described above since L1 making a page
read-only (this is a page table for L2) is an error and should result in #NPF
being injected into L1.

The flow is:

hardware walks page table; L2 page table points to read only memory
-> pf_interception (code =
-> kvm_handle_page_fault (need_unprotect = false)
-> kvm_mmu_page_fault
-> paging64_page_fault (for example)
-> try_async_pf
map_writable set to false
-> paging64_fetch(write_fault = true, map_writable = false, prefault = false)
-> mmu_set_spte(speculative = false, host_writable = false, write_fault = true)
-> set_spte
mmu_need_write_protect returns true
return true
write_fault == true -> set emulate = true
return true
return true
return true

Without this patch, emulation would have called

-> translate_nested_gpa
-> paging64_gva_to_gpa
-> paging64_walk_addr
-> paging64_walk_addr_generic
set fault (nested_page_fault=true)

and then:

-> nested_svm_inject_npf_exit

maybe then safer thing would be to qualify the new error_code check with
!mmu_is_nested(vcpu) or something like that. So that way it would run on
L1 guest, and not the L2 guest. I believe that would restrict it avoid
hitting this case. Are you okay with this change ?

IIRC, the main place where this check was valuable was when L1 guest had
a fault (when coming out of the L2 guest) and emulation was not needed.