Re: [RFC PATCH] kvm,x86: Exit to user space in case of page fault error

From: Vivek Goyal
Date: Tue Jun 30 2020 - 14:26:36 EST


On Tue, Jun 30, 2020 at 05:43:54PM +0200, Vitaly Kuznetsov wrote:
> Vivek Goyal <vgoyal@xxxxxxxxxx> writes:
>
> > On Tue, Jun 30, 2020 at 05:13:54PM +0200, Vitaly Kuznetsov wrote:
> >>
> >> > - If you retry in kernel, we will change the context completely that
> >> > who was trying to access the gfn in question. We want to retain
> >> > the real context and retain information who was trying to access
> >> > gfn in question.
> >>
> >> (Just so I understand the idea better) does the guest context matter to
> >> the host? Or, more specifically, are we going to do anything besides
> >> get_user_pages() which will actually analyze who triggered the access
> >> *in the guest*?
> >
> > When we exit to user space, qemu prints bunch of register state. I am
> > wondering what does that state represent. Does some of that traces
> > back to the process which was trying to access that hva? I don't
> > know.
>
> We can get the full CPU state when the fault happens if we need to but
> generally we are not analyzing it. I can imagine looking at CPL, for
> example, but trying to distinguish guest's 'process A' from 'process B'
> may not be simple.
>
> >
> > I think keeping a cache of error gfns might not be too bad from
> > implemetation point of view. I will give it a try and see how
> > bad does it look.
>
> Right; I'm only worried about the fact that every cache (or hash) has a
> limited size and under certain curcumstances we may overflow it. When an
> overflow happens, we will follow the APF path again and this can go over
> and over.

Sure. But what are the chances of that happening. Say our cache size is
64. That means we need atleast 128 processes to do co-ordinated faults
(all in error zone) to skip the cache completely all the time. We
have to hit cache only once. Chances of missing the error gnf
cache completely for a very long time are very slim. And if we miss
it few times, now harm done. We will just spin few times and then
exit to qemu.

IOW, chances of spinning infinitely are not zero. But they look so
small that in practice I am not worried about it.

> Maybe we can punch a hole in EPT/NPT making the PFN reserved/
> not-present so when the guest tries to access it again we trap the
> access in KVM and, if the error persists, don't follow the APF path?

Cache solution seems simpler than this. Trying to maintain any state
in page tables will be invariably more complex (Especially given
many flavors of paging).

I can start looking in this direction if you really think that its worth
implementing page table based solution for this problem. I feel that
we implement something simpler for now and if there are easy ways
to skip error gns, then replace it with something page table based
solution (This will only require hypervisor change and no guest
changes).

Vivek