Re: [PATCH 6/9] KVM: arm64: Provide a PV_TIME device to user space

From: Marc Zyngier
Date: Mon Aug 05 2019 - 12:29:14 EST


On 05/08/2019 17:10, Steven Price wrote:
> On 03/08/2019 13:51, Marc Zyngier wrote:
>> On Fri, 2 Aug 2019 15:50:14 +0100
>> Steven Price <steven.price@xxxxxxx> wrote:
>>
>>> Allow user space to inform the KVM host where in the physical memory
>>> map the paravirtualized time structures should be located.
>>>
>>> A device is created which provides the base address of an array of
>>> Stolen Time (ST) structures, one for each VCPU. There must be (64 *
>>> total number of VCPUs) bytes of memory available at this location.
>>>
>>> The address is given in terms of the physical address visible to
>>> the guest and must be 64 byte aligned. The memory should be marked as
>>> reserved to the guest to stop it allocating it for other purposes.
>>
>> Why? You seem to be allocating the memory from the kernel, so as far as
>> the guest is concerned, this isn't generally usable memory.
>
> I obviously didn't word it very well - that's what I meant. The "memory"
> that represents the stolen time structure shouldn't be shown to the
> guest as normal memory, but "reserved" for the purpose of stolen time.
>
> To be honest it looks like I forgot to rewrite this commit message -
> which 64 byte alignment is all that the guest can rely on (because each
> vCPU has it's own structure), the actual array of structures needs to be
> page aligned to ensure we can safely map it into the guest.
>
>>>
>>> Signed-off-by: Steven Price <steven.price@xxxxxxx>
>>> ---
>>> arch/arm64/include/asm/kvm_mmu.h | 2 +
>>> arch/arm64/include/uapi/asm/kvm.h | 6 +
>>> arch/arm64/kvm/Makefile | 1 +
>>> include/uapi/linux/kvm.h | 2 +
>>> virt/kvm/arm/mmu.c | 44 +++++++
>>> virt/kvm/arm/pvtime.c | 190 ++++++++++++++++++++++++++++++
>>> 6 files changed, 245 insertions(+)
>>> create mode 100644 virt/kvm/arm/pvtime.c

[...]

>>> +static int kvm_arm_pvtime_set_attr(struct kvm_device *dev,
>>> + struct kvm_device_attr *attr)
>>> +{
>>> + struct kvm_arch_pvtime *pvtime = &dev->kvm->arch.pvtime;
>>> + u64 __user *user = (u64 __user *)attr->addr;
>>> + u64 paddr;
>>> + int ret;
>>> +
>>> + switch (attr->group) {
>>> + case KVM_DEV_ARM_PV_TIME_PADDR:
>>> + if (get_user(paddr, user))
>>> + return -EFAULT;
>>> + if (paddr & 63)
>>> + return -EINVAL;
>>
>> You should check whether the device fits into the IPA space for this
>> guest, and whether it overlaps with anything else.
>
> pvtime_map_pages() should fail in the case of overlap. That seems
> sufficient to me - do you think we need something stronger?

Definitely. stage2_set_pte() won't fail for a non-IO overlapping
mapping, and will just treat it as guest memory. If this overlaps with a
memslot, we'll never be able to fault that page in, ending up with
interesting memory corruption... :-/

That's one of the reasons why I think option (2) in your earlier email
is an interesting one, as it sidesteps a whole lot of ugly and hard to
test corner cases.

Thanks,

M.
--
Jazz is not dead, it just smells funny...