Re: use of shrinker in virtio balloon free page hinting

From: Michael S. Tsirkin
Date: Thu Jul 18 2019 - 00:18:38 EST


On Wed, Jul 17, 2019 at 03:46:57PM +0000, Wang, Wei W wrote:
> On Wednesday, July 17, 2019 7:21 PM, Michael S. Tsirkin wrote:
> >
> > Wei, others,
> >
> > ATM virtio_balloon_shrinker_scan will only get registered when deflate on
> > oom feature bit is set.
> >
> > Not sure whether that's intentional.
>
> Yes, we wanted to follow the old oom behavior, which allows the oom notifier
> to deflate pages only when this feature bit has been negotiated.

It makes sense for pages in the balloon (requested by hypervisor).
However free page hinting can freeze up lots of memory for its own
internal reasons. It does not make sense to ask hypervisor
to set flags in order to fix internal guest issues.

> > Assuming it is:
> >
> > virtio_balloon_shrinker_scan will try to locate and free pages that are
> > processed by host.
> > The above seems broken in several ways:
> > - count ignores the free page list completely
>
> Do you mean virtio_balloon_shrinker_count()? It just reports to
> do_shrink_slab the amount of freeable memory that balloon has.
> (vb->num_pages and vb->num_free_page_blocks are all included )

Right. But that does not include the pages in the hint vq,
which could be a significant amount of memory.


> > - if free pages are being reported, pages freed
> > by shrinker will just get re-allocated again
>
> fill_balloon will re-try the allocation after sleeping 200ms once allocation fails.

Even if ballon was never inflated, if shrinker frees some memory while
we are hinting, hint vq will keep going and allocate it back without
sleeping.

>
> > I was unable to make this part of code behave in any reasonable way - was
> > shrinker usage tested? What's a good way to test that?
>
> Please see the example that I tested before : https://lkml.org/lkml/2018/8/6/29
> (just the first one: *1. V3 patches)
>
> What problem did you see?
> I just tried the latest code, and find ballooning reports a #GP (seems caused by
> 418a3ab1e).
> I'll take a look at the details in the office tomorrow.
>
> Best,
> Wei

I saw that VM hangs. Could be the above problem, let me know how it
goes.