Re: Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

From: Roger Pau Monné
Date: Thu Dec 12 2019 - 10:23:26 EST


On Thu, Dec 12, 2019 at 02:39:05PM +0100, SeongJae Park wrote:
> On Thu, 12 Dec 2019 12:42:47 +0100 "Roger Pau Monné" <roger.pau@xxxxxxxxxx> wrote:
> > > On the slow block device
> > > ------------------------
> > >
> > > max_pgs Min Max Median Avg Stddev
> > > 0 38.7 45.8 38.7 40.12 3.1752165
> > > 1024 38.7 45.8 38.7 40.12 3.1752165
> > > No difference proven at 95.0% confidence
> > >
> > > On the fast block device
> > > ------------------------
> > >
> > > max_pgs Min Max Median Avg Stddev
> > > 0 417 423 420 419.4 2.5099801
> > > 1024 414 425 416 417.8 4.4384682
> > > No difference proven at 95.0% confidence
> >
> > This is intriguing, as it seems to prove that the usage of a cache of
> > free pages is irrelevant performance wise.
> >
> > The pool of free pages was introduced long ago, and it's possible that
> > recent improvements to the balloon driver had made such pool useless,
> > at which point it could be removed instead of worked around.
>
> I guess the grant page allocation overhead in this test scenario is really
> small. In an absence of memory pressure, fragmentation, and NUMA imbalance,
> the latency of the page allocation ('get_page()') is very short, as it will
> success in the fast path.

The allocation of the pool of free pages involves more than get_page,
it uses gnttab_alloc_pages which in the worse case will allocate a
page and balloon it out issuing one hypercall.

> Few years ago, I once measured the page allocation latency on my machine.
> Roughly speaking, it was about 1us in best case, 100us in worst case, and 5us
> in average. Please keep in mind that the measurement was not designed and
> performed in serious way. Thus the results could have profile overhead in it,
> though. While keeping that in mind, let's simply believe the number and ignore
> the latency of the block layer, blkback itself (including the grant
> mapping), and anything else including context switch, cache miss, but the
> allocation. In other words, suppose that the grant page allocation is only one
> source of the overhead. It will be able to achieve 1 million IOPS (4KB *
> 1MIOPS = 4 GB/s) in the best case, 200 thousand IOPS (800 MB/s) in average, and
> 10 thousand IOPS (40 MB/s) in worst case. Based on this coarse calculation, I
> think the test results is reasonable.
>
> This also means that the effect of the blkback's free pages pool might be
> visible under page allocation fast path failure situation. Nevertheless, it
> would be also hard to measure that in micro level unless the measurement is
> well designed and controlled.
>
> >
> > Do you think you could perform some more tests (as pointed out above
> > against the block device to skip the fs overhead) and report back the
> > results?
>
> To be honest, I'm not sure whether additional tests are really necessary,
> because I think the `dd` test and the results explanation already makes some
> sense and provide the minimal proof of the concept. Also, this change is a
> fallback for the memory pressure situation, which is an error path in some
> point of view. Such errorneous situation might not happen frequently and if
> the situation is not solved in short time, something much worse (e.g., OOM kill
> of the user space xen control processes) than temporal I/O performance
> degradation could happen. Thus, I'm not sure whether such detailed performance
> measurement is necessary for this rare error handling change.

Right, my main concern is that we seem to be adding duck tape so
things don't fall apart, but if such cache is really not beneficial
from a performance PoV I would rather see it go away than adding more
stuff to it in order to workaround corner cases like memory
starvation.

Anyway, I guess we can take such change, but long term we need to look
into fixing grants to not use ballooned pages, and figure out if the
blkback free page cache is really useful or not.

Thanks, Roger.