Re: [PATCH v4 10/10] thp: implement refcounting for huge zero page

From: Andrew Morton
Date: Thu Oct 25 2012 - 17:05:17 EST


On Thu, 25 Oct 2012 23:49:59 +0300
"Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:

> On Wed, Oct 24, 2012 at 01:25:52PM -0700, Andrew Morton wrote:
> > On Wed, 24 Oct 2012 22:45:52 +0300
> > "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
> >
> > > On Wed, Oct 24, 2012 at 12:22:53PM -0700, Andrew Morton wrote:
> > > >
> > > > I'm thinking that such a workload would be the above dd in parallel
> > > > with a small app which touches the huge page and then exits, then gets
> > > > executed again. That "small app" sounds realistic to me. Obviously
> > > > one could exercise the zero page's refcount at higher frequency with a
> > > > tight map/touch/unmap loop, but that sounds less realistic. It's worth
> > > > trying that exercise as well though.
> > > >
> > > > Or do something else. But we should try to probe this code's
> > > > worst-case behaviour, get an understanding of its effects and then
> > > > decide whether any such workload is realisic enough to worry about.
> > >
> > > Okay, I'll try few memory pressure scenarios.
>
> A test program:
>
> while (1) {
> posix_memalign((void **)&p, 2 * MB, 2 * MB);
> assert(*p == 0);
> free(p);
> }
>
> With this code in background we have pretty good chance to have huge zero
> page freeable (refcount == 1) when shrinker callback called - roughly one
> of two.
>
> Pagecache hog (dd if=hugefile of=/dev/null bs=1M) creates enough pressure
> to get shrinker callback called, but it was only asked about cache size
> (nr_to_scan == 0).
> I was not able to get it called with nr_to_scan > 0 on this scenario, so
> hzp never freed.

hm. It's odd that the kernel didn't try to shrink slabs in this case.
Why didn't it??

> I also tried another scenario: usemem -n16 100M -r 1000. It creates real
> memory pressure - no easy reclaimable memory. This time callback called
> with nr_to_scan > 0 and we freed hzp. Under pressure we fails to allocate
> hzp and code goes to fallback path as it supposed to.
>
> Do I need to check any other scenario?

I'm thinking that if we do hit problems in this area, we could avoid
freeing the hugepage unless the scan_control.priority is high enough.
That would involve adding a magic number or a tunable to set the
threshold.

Also, it would be beneficial if we can monitor this easily. Perhaps
add a counter to /proc/vmstat which tells us how many times that page
has been reallocated? And perhaps how many times we tried to allocate
it but failed?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/