Re: [RFC PATCH v2 0/5] Paravirt Scheduling (Dynamic vcpu priority management)

From: Sean Christopherson
Date: Fri Jul 12 2024 - 12:40:06 EST


On Fri, Jul 12, 2024, Steven Rostedt wrote:
> On Fri, 12 Jul 2024 11:32:30 -0400
> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> > >>> I was looking at the rseq on request from the KVM call, however it does not
> > >>> make sense to me yet how to expose the rseq area via the Guest VA to the host
> > >>> kernel. rseq is for userspace to kernel, not VM to kernel.
> > >
> > > Any memory that is exposed to host userspace can be exposed to the guest. Things
> > > like this are implemented via "overlay" pages, where the guest asks host userspace
> > > to map the magic page (rseq in this case) at GPA 'x'. Userspace then creates a
> > > memslot that overlays guest RAM to map GPA 'x' to host VA 'y', where 'y' is the
> > > address of the page containing the rseq structure associated with the vCPU (in
> > > pretty much every modern VMM, each vCPU has a dedicated task/thread).
> > >
> > > A that point, the vCPU can read/write the rseq structure directly.
>
> So basically, the vCPU thread can just create a virtio device that
> exposes the rseq memory to the guest kernel?
>
> One other issue we need to worry about is that IIUC rseq memory is
> allocated by the guest/user, not the host kernel. This means it can be
> swapped out. The code that handles this needs to be able to handle user
> page faults.

This is a non-issue, it will Just Work, same as any other memory that is exposed
to the guest and can be reclaimed/swapped/migrated..

If the host swaps out the rseq page, mmu_notifiers will call into KVM and KVM will
unmap the page from the guest. If/when the page is accessed by the guest, KVM
will fault the page back into the host's primary MMU, and then map the new pfn
into the guest.