Re: [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches

From: Bandan Das
Date: Wed May 07 2014 - 00:46:05 EST


Paolo Bonzini <pbonzini@xxxxxxxxxx> writes:

> Il 06/05/2014 02:40, Bandan Das ha scritto:
>> On every instruction fetch, kvm_read_guest_virt_helper
>> does the gva to gpa translation followed by searching for the
>> memslot. Store the gva hva mapping so that if there's a match
>> we can directly call __copy_from_user()
>>
>> Suggested-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> Signed-off-by: Bandan Das <bsd@xxxxxxxxxx>
>> ---
>> arch/x86/include/asm/kvm_emulate.h | 7 ++++++-
>> arch/x86/kvm/x86.c | 33 +++++++++++++++++++++++----------
>> 2 files changed, 29 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
>> index 085d688..20ccde4 100644
>> --- a/arch/x86/include/asm/kvm_emulate.h
>> +++ b/arch/x86/include/asm/kvm_emulate.h
>> @@ -323,10 +323,11 @@ struct x86_emulate_ctxt {
>> int (*execute)(struct x86_emulate_ctxt *ctxt);
>> int (*check_perm)(struct x86_emulate_ctxt *ctxt);
>> /*
>> - * The following five fields are cleared together,
>> + * The following six fields are cleared together,
>> * the rest are initialized unconditionally in x86_decode_insn
>> * or elsewhere
>> */
>> + bool addr_cache_valid;
>
> You can just set gfn to -1 to invalidate the fetch.
>
>> u8 rex_prefix;
>> u8 lock_prefix;
>> u8 rep_prefix;
>> @@ -348,6 +349,10 @@ struct x86_emulate_ctxt {
>> struct fetch_cache fetch;
>> struct read_cache io_read;
>> struct read_cache mem_read;
>> + struct {
>> + gfn_t gfn;
>> + unsigned long uaddr;
>> + } addr_cache;
>
> This is not used by emulate.c, so it should be directly in struct
> kvm_vcpu. You could invalidate it in init_emulate_ctxt, together with
> emulate_regs_need_sync_from_vcpu.

Ok.

> In the big picture, however, what we really want is to persist the
> cache across multiple instructions, and also cache the linearization
> of the address (where you add RIP and CS.base). This would be a
> bigger source of improvement. If you do that, the cache has to be
> indeed in x86_emulate_ctxt, but on the other hand the implementation
> needs to be in emulate.c.
>
>> };
>>
>> /* Repeat String Operation Prefix */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index cf69e3b..7afcfc7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4072,26 +4072,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
>> unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
>> int ret;
>> unsigned long uaddr;
>> + gfn_t gfn = addr >> PAGE_SHIFT;
>>
>> - ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
>> - exception, false,
>> - NULL, &uaddr);
>> - if (ret != X86EMUL_CONTINUE)
>> - return ret;
>> + if (ctxt->addr_cache_valid &&
>> + (ctxt->addr_cache.gfn == gfn))
>> + uaddr = (ctxt->addr_cache.uaddr << PAGE_SHIFT) +
>> + offset_in_page(addr);
>> + else {
>> + ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
>> + exception, false,
>> + NULL, &uaddr);
>
> Did you measure the hit rate, and the speedup after every patch? My
> reading of the code is that the fetch is done only once per page, with

Yes, I did. IIRC, patch 3 actually improved the speedup compared to 2.
A rough estimate for jmp - patch 2 reduced it to the low 600s, I guess
around 610, but I could get a fresh set of numbers.

So, not sure where the speed up is coming from, I will try to find out.

> the speedup coming from the microoptimization that patch 2 provides
> for single-page reads. Single-page reads are indeed a very common
> case for the emulator.
>
> I suggest to start with making patch 2 ready for inclusion.
>
> Paolo
>
>> + if (ret != X86EMUL_CONTINUE)
>> + return ret;
>> +
>> + if (unlikely(kvm_is_error_hva(uaddr))) {
>> + r = X86EMUL_PROPAGATE_FAULT;
>> + return r;
>> + }
>>
>> - if (unlikely(kvm_is_error_hva(uaddr))) {
>> - r = X86EMUL_PROPAGATE_FAULT;
>> - return r;
>> + /* Cache gfn and hva */
>> + ctxt->addr_cache.gfn = addr >> PAGE_SHIFT;
>> + ctxt->addr_cache.uaddr = uaddr >> PAGE_SHIFT;
>> + ctxt->addr_cache_valid = true;
>> }
>>
>> ret = __copy_from_user(data, (void __user *)uaddr, toread);
>> if (ret < 0) {
>> r = X86EMUL_IO_NEEDED;
>> + /* Where else should we invalidate cache ? */
>> + ctxt->ops->memory_finish(ctxt, NULL, uaddr);
>> return r;
>> }
>>
>> - ctxt->ops->memory_finish(ctxt, NULL, uaddr);
>> -
>> bytes -= toread;
>> data += toread;
>> addr += toread;
>> @@ -4339,6 +4351,7 @@ static void emulator_memory_finish(struct x86_emulate_ctxt *ctxt,
>> struct kvm_memory_slot *memslot;
>> gfn_t gfn;
>>
>> + ctxt->addr_cache_valid = false;
>> if (!opaque)
>> return;
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/