Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO

From: Liang Li
Date: Tue Dec 22 2020 - 09:01:53 EST


https://static.sched.com/hosted_files/kvmforum2020/51/The%20Practice%20Method%20to%20Speed%20Up%2010x%20Boot-up%20Time%20for%20Guest%20in%20Alibaba%20Cloud.pdf
> >
> > and the flowing link is mine:
> > https://static.sched.com/hosted_files/kvmforum2020/90/Speed%20Up%20Creation%20of%20a%20VM%20With%20Passthrough%20GPU.pdf
>
> Thanks for the pointers! I actually did watch your presentation.

You're welcome! And thanks for your time! :)
> >>
> >>>
> >>> Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for memory
> >>> overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest page
> >>> to the VMM, VMM will unmap the corresponding host page for reclaim,
> >>> when guest allocate a page just reclaimed, host will allocate a new page
> >>> and zero it out for guest, in this case pre zero out free page will help
> >>> to speed up the proccess of fault in and reduce the performance impaction.
> >>
> >> Such faults in the VMM are no different to other faults, when first
> >> accessing a page to be populated. Again, I wonder how much of a
> >> difference it actually makes.
> >>
> >
> > I am not just referring to faults in the VMM, I mean the whole process
> > that handles guest page faults.
> > without VIRTIO_BALLOON_F_REPORTING, pages used by guests will be zero
> > out only once by host. With VIRTIO_BALLOON_F_REPORTING, free pages are
> > reclaimed by the host and may return to the host buddy
> > free list. When the pages are given back to the guest, the host kernel
> > needs to zero out it again. It means
> > with VIRTIO_BALLOON_F_REPORTING, guest memory performance will be
> > degraded for frequently
> > zero out operation on host side. The performance degradation will be
> > obvious for huge page case. Free
> > page pre zero out can help to make guest memory performance almost the
> > same as without
> > VIRTIO_BALLOON_F_REPORTING.
>
> Yes, what I am saying is that this fault handling is no different to
> ordinary faults when accessing a virtual memory location the first time
> and populating a page. The only difference is that it happens
> continuously, not only the first time we touch a page.
>
> And we might be able to improve handling in the hypervisor in the
> future. We have been discussing using MADV_FREE instead of MADV_DONTNEED
> in QEMU for handling free page reporting. Then, guest reported pages
> will only get reclaimed by the hypervisor when there is actual memory
> pressure in the hypervisor (e.g., when about to swap). And zeroing a
> page is an obvious improvement over going to swap. The price for zeroing
> pages has to be paid at one point.
>
> Also note that we've been discussing cache-related things already. If
> you zero out before giving the page to the guest, the page will already
> be in the cache - where the guest directly wants to access it.
>

OK, that's very reasonable and much better. Looking forward for your work.

> >>>
> >>> Security
> >>> ========
> >>> This is a weak version of "introduce init_on_alloc=1 and init_on_free=1
> >>> boot options", which zero out page in a asynchronous way. For users can't
> >>> tolerate the impaction of 'init_on_alloc=1' or 'init_on_free=1' brings,
> >>> this feauture provide another choice.
> >> "we don’t pre zero out all the free pages" so this is of little actual use.
> >
> > OK. It seems none of the reasons listed above is strong enough for
>
> I was rather saying that for security it's of little use IMHO.
> Application/VM start up time might be improved by using huge pages (and
> pre-zeroing these). Free page reporting might be improved by using
> MADV_FREE instead of MADV_DONTNEED in the hypervisor.
>
> > this feature, above all of them, which one is likely to become the
> > most strong one? From the implementation, you will find it is
> > configurable, users don't want to use it can turn it off. This is not
> > an option?
>
> Well, we have to maintain the feature and sacrifice a page flag. For
> example, do we expect someone explicitly enabling the feature just to
> speed up startup time of an app that consumes a lot of memory? I highly
> doubt it.

In our production environment, there are three main applications have such
requirement, one is QEMU [creating a VM with SR-IOV passthrough device],
anther other two are DPDK related applications, DPDK OVS and SPDK vhost,
for best performance, they populate memory when starting up. For SPDK vhost,
we make use of the VHOST_USER_GET/SET_INFLIGHT_FD feature for
vhost 'live' upgrade, which is done by killing the old process and
starting a new
one with the new binary. In this case, we want the new process started as quick
as possible to shorten the service downtime. We really enable this feature
to speed up startup time for them :)

> I'd love to hear opinions of other people. (a lot of people are offline
> until beginning of January, including, well, actually me :) )

OK. I will wait some time for others' feedback. Happy holidays!

thanks!

Liang