Re: [PATCH 0/2] Expose KVM API to Linux Kernel
From: Andy Lutomirski
Date: Mon May 18 2020 - 16:45:46 EST
On Mon, May 18, 2020 at 1:50 AM Anastassios Nanos
<ananos@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, May 18, 2020 at 10:50 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >
> > On 2020-05-18 07:58, Anastassios Nanos wrote:
> > > To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
> > > QEMU, or some other kind of VM monitor in user-space to host the vCPU
> > > threads, I/O threads and various other book-keeping/management
> > > mechanisms.
> > > This is perfectly fine for a large number of reasons and use cases: for
> > > instance, running generic VMs, running general purpose Operating
> > > systems
> > > that need some kind of emulation for legacy boot/hardware etc.
> > >
> > > What if we wanted to execute a small piece of code as a guest instance,
> > > without the involvement of user-space? The KVM functions are already
> > > doing
> > > what they should: VM and vCPU setup is already part of the kernel, the
> > > only
> > > missing piece is memory handling.
> > >
> > > With these series, (a) we expose to the Linux Kernel the bare minimum
> > > KVM
> > > API functions in order to spawn a guest instance without the
> > > intervention
> > > of user-space; and (b) we tweak the memory handling code of KVM-related
> > > functions to account for another kind of guest, spawned in
> > > kernel-space.
> > >
> > > PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces
> > > the
> > > changes in the KVM memory handling code for x86_64 and aarch64.
> > >
> > > An example of use is provided based on kvmtest.c
> > > [http://email.nubificus.co.uk/c/eJwdzU0LgjAAxvFPo0eZm1t62MEkC0xQScJTuBdfcGrpQuvTN4KHP7_bIygSDQfY7mkUXotbzQJQftIX7NI9EtEYofOW3eMJ6uTxTtIqz2B1LPhl-w6nMrc8MNa9ctp_-TzaHWUekxwfSMCRIA3gLvFrQAiGDUNE-MxWtNP6uVootGBsprbJmaQ2ChfdcyVXQ4J97EIDe6G7T8zRIJdJKmde2h_0WTe_] at
> > > http://email.nubificus.co.uk/c/eJwljdsKgkAYhJ9GL2X9NQ8Xe2GSBSaoJOFVrOt6QFdL17Sevq1gGPhmGKbERllRtFNb7Hvn9EIKF2Wv6AFNtPmlz33juMbXYAAR3pYwypMY8n1KT-u7O2SJYiJO2l6rf05HrjbYsCihRUEp2DYCgmyH2TowGeiVCS6oPW6EuM-K4SkQSNWtaJbiu5ZA-3EpOzYNrJ8ldk_OBZuFOuHNseTdv9LGqf4Apyg8eg
>
> Hi Marc,
>
> thanks for taking the time to check this!
>
> >
> > You don't explain *why* we would want this. What is the overhead of
> > having
> > a userspace if your guest doesn't need any userspace handling? The
> > kvmtest
> > example indeed shows that the KVM userspace API is usable without any
> > form
> > of emulation, hence has almost no cost.
>
> The rationale behind such an approach is two-fold:
> (a) we are able to ditch any user-space involvement in the creation and
> spawning of a KVM guest. This is particularly interesting in use-cases
> where short-lived tasks are spawned on demand. Think of a scenario where
> an ABI compatible binary is loaded in memory. Spawning it as a guest from
> userspace would incur a number of IOCTLs. Doing the same from the kernel
> would be the same number of IOCTLs but now these are function calls;
> additionally, memory handling is kind of simplified.
>
> (b) I agree that the userspace KVM API is usable without emulation for a
> simple task, written in bytecode, adding two registers. But what about
> something more complicated? something that needs I/O? for most use-cases,
> I/O happens between the guest and some hardware device (network/storage
> etc.). Being in the kernel saves us from doing unneccessary mode switches.
> Of course there are optimizations for handling I/O on QEMU/KVM VMs
> (virtio/vhost), but essentially what happens is removing mode-switches (and
> exits) for I/O operations -- is there a good reason not to address that
> directly? a guest running in the kernel exits because of an I/O request,
> which gets processed and forwarded directly to the relevant subsystem *in*
> the kernel (net/block etc.).
>
> We work on both directions with a particular focus on (a) -- device I/O could
> be handled with other mechanisms as well (VFs for instance).
>
> > Without a clear description of the advantages of your solution, as well
> > as a full featured in-tree use case, I find it pretty hard to support
> > this.
>
> Totally understand that -- please keep in mind that this is a first (baby)
> step for what we call KVMM (kernel virtual machine monitor). We presented
> the architecture at FOSDEM and some preliminary results regarding I/O. Of
> course, this is WiP, and far from being upstreamable. Hence the kvmmtest
> example showcasing the potential use-case.
>
> To be honest my main question is whether we are interested in such an
> approach in the first place, and then try to work on any rough edges. As
> far as I understand, you're not in favor of this approach.
The usual answer here is that the kernel is not in favor of adding
in-kernel functionality that is not used in the upstream kernel. If
you come up with a real use case, and that use case is GPL and has
plans for upstreaming, and that use case has a real benefit
(dramatically faster than user code could likely be, does something
new and useful, etc), then it may well be mergeable.