Re: [PATCH RFCv2 0/9] kvm/arm64: Support Async Page Fault
From: Mark Rutland
Date: Tue May 26 2020 - 09:09:35 EST
Hi Gavin,
At a high-level I'm rather fearful of this series. I can see many ways
that this can break, and I can also see that even if/when we get things
into a working state, constant vigilance will be requried for any
changes to the entry code.
I'm not keen on injecting non-architectural exceptions in this way, and
I'm also not keen on how deep the PV hooks are injected currently (e.g.
in the ret_to_user path).
I see a few patches have preparator cleanup that I think would be
worthwhile regardless of this series; if you could factor those out and
send them on their own it would get that out of the way and make it
easier to review the series itself. Similarly, there's some duplication
of code from arch/x86 which I think can be factored out to virt/kvm
instead as preparatory work.
Generally, I also think that you need to spend some time on commit
messages and/or documentation to better explain the concepts and
expected usage. I had to reverse-engineer the series by reviewing it in
entirety before I had an idea as to how basic parts of it strung
together, and a more thorough conceptual explanation would make it much
easier to critique the approach rather than the individual patches.
On Fri, May 08, 2020 at 01:29:10PM +1000, Gavin Shan wrote:
> Testing
> =======
> The tests are carried on the following machine. A guest with single vCPU
> and 4GB memory is started. Also, the QEMU process is put into memory cgroup
> (v1) whose memory limit is set to 2GB. In the guest, there are two threads,
> which are memory bound and CPU bound separately. The memory bound thread
> allocates all available memory, accesses and them free them. The CPU bound
> thread simply executes block of "nop".
I appreciate this is a microbenchmark, but that sounds far from
realistic.
Is there a specitic real workload that's expected to be representative
of?
Can you run tests with a real workload? For example, a kernel build
inside the VM?
> The test is carried out for 5 time
> continuously and the average number (per minute) of executed blocks in the
> CPU bound thread is taken as indicator of improvement.
>
> Vendor: GIGABYTE CPU: 224 x Cavium ThunderX2(R) CPU CN9975 v2.2 @ 2.0GHz
> Memory: 32GB Disk: Fusion-MPT SAS-3 (PCIe3.0 x8)
>
> Without-APF: 7029030180/minute = avg(7559625120 5962155840 7823208540
> 7629633480 6170527920)
> With-APF: 8286827472/minute = avg(8464584540 8177073360 8262723180
> 8095084020 8434672260)
> Outcome: +17.8%
>
> Another test case is to measure the time consumed by the application, but
> with the CPU-bound thread disabled.
>
> Without-APF: 40.3s = avg(40.6 39.3 39.2 41.6 41.2)
> With-APF: 40.8s = avg(40.6 41.1 40.9 41.0 40.7)
> Outcome: +1.2%
So this is pure overhead in that case?
I think we need to see a real workload that this benefits. As it stands
it seems that this is a lot of complexity to game a synthetic benchmark.
Thanks,
Mark.
> I also have some code in the host to capture the number of async page faults,
> time used to do swapin and its maximal/minimal values when async page fault
> is enabled. During the test, the CPU-bound thread is disabled. There is about
> 30% of the time used to do swapin.
>
> Number of async page fault: 7555 times
> Total time used by application: 42.2 seconds
> Total time used by swapin: 12.7 seconds (30%)
> Minimal swapin time: 36.2 us
> Maximal swapin time: 55.7 ms
>
> Changelog
> =========
> RFCv1 -> RFCv2
> * Rebase to 5.7.rc3
> * Performance data (Marc Zyngier)
> * Replace IMPDEF system register with KVM vendor specific hypercall (Mark Rutland)
> * Based on Will's KVM vendor hypercall probe mechanism (Will Deacon)
> * Don't use IMPDEF DFSC (0x43). Async page fault reason is conveyed
> by the control block (Mark Rutland)
> * Delayed wakeup mechanism in guest kernel (Gavin Shan)
> * Stability improvement in the guest kernel: delayed wakeup mechanism,
> external abort disallowed region, lazily clear async page fault,
> disabled interrupt on acquiring the head's lock and so on (Gavin Shan)
> * Stability improvement in the host kernel: serialized async page
> faults etc. (Gavin Shan)
> * Performance improvement in guest kernel: percpu sleeper head (Gavin Shan)
>
> Gavin Shan (7):
> kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr()
> kvm/arm64: Detach ESR operator from vCPU struct
> kvm/arm64: Replace hsr with esr
> kvm/arm64: Export kvm_handle_user_mem_abort() with prefault mode
> kvm/arm64: Support async page fault
> kernel/sched: Add cpu_rq_is_locked()
> arm64: Support async page fault
>
> Will Deacon (2):
> arm64: Probe for the presence of KVM hypervisor services during boot
> arm/arm64: KVM: Advertise KVM UID to guests via SMCCC
>
> arch/arm64/Kconfig | 11 +
> arch/arm64/include/asm/exception.h | 3 +
> arch/arm64/include/asm/hypervisor.h | 11 +
> arch/arm64/include/asm/kvm_emulate.h | 83 +++--
> arch/arm64/include/asm/kvm_host.h | 47 +++
> arch/arm64/include/asm/kvm_para.h | 40 +++
> arch/arm64/include/uapi/asm/Kbuild | 2 -
> arch/arm64/include/uapi/asm/kvm_para.h | 22 ++
> arch/arm64/kernel/entry.S | 33 ++
> arch/arm64/kernel/process.c | 4 +
> arch/arm64/kernel/setup.c | 35 ++
> arch/arm64/kvm/Kconfig | 1 +
> arch/arm64/kvm/Makefile | 2 +
> arch/arm64/kvm/handle_exit.c | 48 +--
> arch/arm64/kvm/hyp/switch.c | 33 +-
> arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c | 7 +-
> arch/arm64/kvm/inject_fault.c | 4 +-
> arch/arm64/kvm/sys_regs.c | 38 +-
> arch/arm64/mm/fault.c | 434 +++++++++++++++++++++++
> include/linux/arm-smccc.h | 32 ++
> include/linux/sched.h | 1 +
> kernel/sched/core.c | 8 +
> virt/kvm/arm/arm.c | 40 ++-
> virt/kvm/arm/async_pf.c | 335 +++++++++++++++++
> virt/kvm/arm/hyp/aarch32.c | 4 +-
> virt/kvm/arm/hyp/vgic-v3-sr.c | 7 +-
> virt/kvm/arm/hypercalls.c | 37 +-
> virt/kvm/arm/mmio.c | 27 +-
> virt/kvm/arm/mmu.c | 69 +++-
> 29 files changed, 1264 insertions(+), 154 deletions(-)
> create mode 100644 arch/arm64/include/asm/kvm_para.h
> create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h
> create mode 100644 virt/kvm/arm/async_pf.c
>
> --
> 2.23.0
>