Re: [PATCH v6 11/12] KVM: introduce readonly memslot

From: Xiao Guangrong
Date: Fri Sep 07 2012 - 06:47:27 EST


On 09/07/2012 06:23 PM, Jan Kiszka wrote:
> On 2012-08-21 05:02, Xiao Guangrong wrote:
>> In current code, if we map a readonly memory space from host to guest
>> and the page is not currently mapped in the host, we will get a fault
>> pfn and async is not allowed, then the vm will crash
>>
>> We introduce readonly memory region to map ROM/ROMD to the guest, read access
>> is happy for readonly memslot, write access on readonly memslot will cause
>> KVM_EXIT_MMIO exit
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxxxxxx>
>> ---
>> Documentation/virtual/kvm/api.txt | 10 +++-
>> arch/x86/include/asm/kvm.h | 1 +
>> arch/x86/kvm/mmu.c | 9 ++++
>> arch/x86/kvm/x86.c | 1 +
>> include/linux/kvm.h | 6 ++-
>> include/linux/kvm_host.h | 7 +--
>> virt/kvm/kvm_main.c | 96 ++++++++++++++++++++++++++++++-------
>> 7 files changed, 102 insertions(+), 28 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index bf33aaa..b91bfd4 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -857,7 +857,8 @@ struct kvm_userspace_memory_region {
>> };
>>
>> /* for kvm_memory_region::flags */
>> -#define KVM_MEM_LOG_DIRTY_PAGES 1UL
>> +#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
>> +#define KVM_MEM_READONLY (1UL << 1)
>>
>> This ioctl allows the user to create or modify a guest physical memory
>> slot. When changing an existing slot, it may be moved in the guest
>> @@ -873,9 +874,12 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
>> be identical. This allows large pages in the guest to be backed by large
>> pages in the host.
>>
>> -The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which
>> +The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which
>> instructs kvm to keep track of writes to memory within the slot. See
>> -the KVM_GET_DIRTY_LOG ioctl.
>> +the KVM_GET_DIRTY_LOG ioctl. Another flag is KVM_MEM_READONLY when the
>> +KVM_CAP_READONLY_MEM capability, it indicates the guest memory is read-only,
>> +that means, guest is only allowed to read it.
>
> The last sentence requires some refactoring. :) How about: "The
> KVM_CAP_READONLY_MEM capability indicates the availability of the
> KVM_MEM_READONLY flag. When this flag is set for a memory region, KVM
> only allows read accesses."

Undoubtedly, your sentence is far better than mine. :)

By the way, this patchset has been applied on kvm -next branch, would
you mind to post a patch to do these works.

>
>> Writes will be posted to
>> +userspace as KVM_EXIT_MMIO exits.
>>
>> When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory
>> region are automatically reflected into the guest. For example, an mmap()
>> diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
>> index 246617e..521bf25 100644
>> --- a/arch/x86/include/asm/kvm.h
>> +++ b/arch/x86/include/asm/kvm.h
>> @@ -25,6 +25,7 @@
>> #define __KVM_HAVE_DEBUGREGS
>> #define __KVM_HAVE_XSAVE
>> #define __KVM_HAVE_XCRS
>> +#define __KVM_HAVE_READONLY_MEM
>>
>> /* Architectural interrupt line count. */
>> #define KVM_NR_INTERRUPTS 256
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 5548971..8e312a2 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2647,6 +2647,15 @@ static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *
>>
>> static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, pfn_t pfn)
>> {
>> + /*
>> + * Do not cache the mmio info caused by writing the readonly gfn
>> + * into the spte otherwise read access on readonly gfn also can
>> + * caused mmio page fault and treat it as mmio access.
>> + * Return 1 to tell kvm to emulate it.
>> + */
>> + if (pfn == KVM_PFN_ERR_RO_FAULT)
>> + return 1;
>> +
>> if (pfn == KVM_PFN_ERR_HWPOISON) {
>> kvm_send_hwpoison_signal(gfn_to_hva(vcpu->kvm, gfn), current);
>> return 0;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 704680d..42bbf41 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2175,6 +2175,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>> case KVM_CAP_GET_TSC_KHZ:
>> case KVM_CAP_PCI_2_3:
>> case KVM_CAP_KVMCLOCK_CTRL:
>> + case KVM_CAP_READONLY_MEM:
>> r = 1;
>> break;
>> case KVM_CAP_COALESCED_MMIO:
>> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
>> index 2de335d..d808694 100644
>> --- a/include/linux/kvm.h
>> +++ b/include/linux/kvm.h
>> @@ -106,7 +106,8 @@ struct kvm_userspace_memory_region {
>> * other bits are reserved for kvm internal use which are defined in
>> * include/linux/kvm_host.h.
>> */
>> -#define KVM_MEM_LOG_DIRTY_PAGES 1UL
>> +#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0)
>> +#define KVM_MEM_READONLY (1UL << 1)
>>
>> /* for KVM_IRQ_LINE */
>> struct kvm_irq_level {
>> @@ -621,6 +622,9 @@ struct kvm_ppc_smmu_info {
>> #define KVM_CAP_PPC_GET_SMMU_INFO 78
>> #define KVM_CAP_S390_COW 79
>> #define KVM_CAP_PPC_ALLOC_HTAB 80
>> +#ifdef __KVM_HAVE_READONLY_MEM
>> +#define KVM_CAP_READONLY_MEM 81
>> +#endif
>
> See the discussion on the userspace patches: The CAP, as userspace API,
> should really be defined unconditionally, kernel code should depend on
> __KVM_HAVE_READONLY_MEM or a to-be-introduced CONFIG_KVM_HAVE_xxx
> switch. This allows for cleaner userspace code.

I see that using __KVM_HAVE_ around CAP in kvm.h is very popular:

$ grep __KVM_HAVE include/linux/kvm.h | wc -l
20

As your suggestion, userspace will always use the CAP even if the CAP
is not really supported. We do not need care the overload on other arches?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/