Re: [RFC][Patch v9 2/6] KVM: Enables the kernel to isolate guest free pages
From: Alexander Duyck
Date: Fri Mar 08 2019 - 16:39:51 EST
On Fri, Mar 8, 2019 at 11:39 AM Nitesh Narayan Lal <nitesh@xxxxxxxxxx> wrote:
>
> On 3/8/19 2:25 PM, Alexander Duyck wrote:
> > On Fri, Mar 8, 2019 at 11:10 AM Nitesh Narayan Lal <nitesh@xxxxxxxxxx> wrote:
> >>
> >> On 3/8/19 1:06 PM, Alexander Duyck wrote:
> >>> On Thu, Mar 7, 2019 at 6:32 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> >>>> On Thu, Mar 07, 2019 at 02:35:53PM -0800, Alexander Duyck wrote:
> >>>>> The only other thing I still want to try and see if I can do is to add
> >>>>> a jiffies value to the page private data in the case of the buddy
> >>>>> pages.
> >>>> Actually there's one extra thing I think we should do, and that is make
> >>>> sure we do not leave less than X% off the free memory at a time.
> >>>> This way chances of triggering an OOM are lower.
> >>> If nothing else we could probably look at doing a watermark of some
> >>> sort so we have to have X amount of memory free but not hinted before
> >>> we will start providing the hints. It would just be a matter of
> >>> tracking how much memory we have hinted on versus the amount of memory
> >>> that has been pulled from that pool.
> >> This is to avoid false OOM in the guest?
> > Partially, though it would still be possible. Basically it would just
> > be a way of determining when we have hinted "enough". Basically it
> > doesn't do us much good to be hinting on free memory if the guest is
> > already constrained and just going to reallocate the memory shortly
> > after we hinted on it. The idea is with a watermark we can avoid
> > hinting until we start having pages that are actually going to stay
> > free for a while.
> >
> >>> It is another reason why we
> >>> probably want a bit in the buddy pages somewhere to indicate if a page
> >>> has been hinted or not as we can then use that to determine if we have
> >>> to account for it in the statistics.
> >> The one benefit which I can see of having an explicit bit is that it
> >> will help us to have a single hook away from the hot path within buddy
> >> merging code (just like your arch_merge_page) and still avoid duplicate
> >> hints while releasing pages.
> >>
> >> I still have to check PG_idle and PG_young which you mentioned but I
> >> don't think we can reuse any existing bits.
> > Those are bits that are already there for 64b. I think those exist in
> > the page extension for 32b systems. If I am not mistaken they are only
> > used in VMA mapped memory. What I was getting at is that those are the
> > bits we could think about reusing.
> >
> >> If we really want to have something like a watermark, then can't we use
> >> zone->free_pages before isolating to see how many free pages are there
> >> and put a threshold on it? (__isolate_free_page() does a similar thing
> >> but it does that on per request basis).
> > Right. That is only part of it though since that tells you how many
> > free pages are there. But how many of those free pages are hinted?
> > That is the part we would need to track separately and then then
> > compare to free_pages to determine if we need to start hinting on more
> > memory or not.
> Only pages which are isolated will be hinted, and once a page is
> isolated it will not be counted in the zone free pages.
> Feel free to correct me if I am wrong.
You are correct up to here. When we isolate the page it isn't counted
against the free pages. However after we complete the hint we end up
taking it out of isolation and returning it to the "free" state, so it
will be counted against the free pages.
> If I am understanding it correctly you only want to hint the idle pages,
> is that right?
Getting back to the ideas from our earlier discussion, we had 3 stages
for things. Free but not hinted, isolated due to hinting, and free and
hinted. So what we would need to do is identify the size of the first
pool that is free and not hinted by knowing the total number of free
pages, and then subtract the size of the pages that are hinted and
still free.
> >
> >>>>> With that we could track the age of the page so it becomes
> >>>>> easier to only target pages that are truly going cold rather than
> >>>>> trying to grab pages that were added to the freelist recently.
> >>>> I like that but I have a vague memory of discussing this with Rik van
> >>>> Riel and him saying it's actually better to take away recently used
> >>>> ones. Can't see why would that be but maybe I remember wrong. Rik - am I
> >>>> just confused?
> >>> It is probably to cut down on the need for disk writes in the case of
> >>> swap. If that is the case it ends up being a trade off.
> >>>
> >>> The sooner we hint the less likely it is that we will need to write a
> >>> given page to disk. However the sooner we hint, the more likely it is
> >>> we will need to trigger a page fault and pull back in a zero page to
> >>> populate the last page we were working on. The sweet spot will be that
> >>> period of time that is somewhere in between so we don't trigger
> >>> unnecessary page faults and we don't need to perform additional swap
> >>> reads/writes.
> >> --
> >> Regards
> >> Nitesh
> >>
> --
> Regards
> Nitesh
>