Re: [PATCH] 2/2 Prezeroing large blocks of pages during allocation

From: Mel Gorman
Date: Sun Mar 06 2005 - 19:39:31 EST

Next message: Andrew Morton: "Re: [patch 07/12] Re: radio-sf16fmi cleanup"
Previous message: Andrew Morton: "Re: [patch 04/12] drivers/net/myri_code.h cleanup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 28 Feb 2005, Christoph Lameter wrote:

> On Sun, 27 Feb 2005, Mel Gorman wrote:
>
> > The patch also counts how many blocks of each order were zeroed. This gives
> > a rough indicator if large blocks are frequently zeroed or not. I found
> > that order-0 are the most frequent zeroed block because of the per-cpu
> > caches. This means we rarely win with zeroing in the allocator but the
> > accounting mechanisms are still handy for the scrubber daemon.
>
> Thanks for your efforts in integrating zeroing into your patches to reduce
> fragmentation.

No problem.

> It is true that you do not win with zeroing pages in the
> allocator. However, you may avoid additional zeroing by zeroing higher
> order pages and then breaking them into lower order pages (but this will
> then lead to additional fragmentation).
>

I got around the fragmentation problem by having a userzero and kernzero
pool. I also taught rmqueue_bulk() to allocate memory in as large as
chunks as possible and break it up into appropriate sizes. This means that
when the per-cpu caches are allocating 16 pages, we can now allocate this
as one 2**4 allocation rather than 16 2**0 allocations (which is possibly
a win in general, not just the prezeroing case, but I have not measured
it). The zeroblock counts after a stress test now look something like
this;

Zeroblock count 96968 145994 125787 75329 32553 110 11 73 26 5 175

That is a big improvement as we are not zeroing order-0 pages nearly as
often as we were. It is no longer regressing in terms of fragmentation
either which is important. I need to test the patch more but hope to post
a new version by tomorrow evening. It will also need careful reviewing to
make sure I did not miss something with the per-cpu caches.

> > This patch seriously regresses how well fragmentation is handled making it
> > perform almost as badly as the standard allocator. It is because the fallback
> > ordering for USERZERO has a tendency to clobber the reserved lists because
> > of the mix of allocation types that need to be zeroed.
>
> Having pages of multiple orders in zeroed and not zeroed state invariably
> leads to more fragmentation. I have also observed that with my patches
> under many configurations. Seems that the only solution is to
> intentionally either zero all free pages (which means you can coalesce
> them all but you are zeroing lots of pages that did not need zeroing
> after all) or you disregard the zeroed state during coalescing, either
> insure that both are zeroed or mark the results as unzeroed... both
> solutions introduce additional overhead.
>

My approach is to ignore zero pages during free/coalescing and to treat
kernel allocations for zero pages differently to userspace allocations for
zero pages.

> My favorite solution has been so far to try to zero all
> pages from the highest order downward but only when the system is idle
> (or there is some hardware that does zeroing for us). And maybe we better
> drop the zeroed status if a zeroed and an unzeroed page can be coalesced
> to a higher order page? However, there will still be lots of unnecessary
> zeroing.
>

When splitting, I zero the largest possible block on the assumption it
makes sense to zero larger blocks. During coalesing, I ignore the zero
state altogether as I could not think of a fast way of determining if a
page was zeroed or not.

> Since most of the request for zeroed pages are order-0 requests, we could
> do a similar thing to that M$ Windows does
> (http://www.windowsitpro.com/Articles/Index.cfm?ArticleID=3774&pg=2): Keep
> a list of zeroed order 0 pages around, only put things on that list if
> the system is truly idle and pick pages up for order 0 zeroed accesses.
>
> These zero lists would needed to be managed more like cpu hotlists and
> not like we do currently as buddy allocator freelists.
>

I went with a variation of this approach. In my latest tree, I have
pageset and pageset_zero to represent a per-CPU cache of normal pages and
of zeroed pages. I have the buddy free lists to zero pages that are only
filled when splitting up a large block of pages for a zero page
allocation.

--
Mel Gorman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [patch 07/12] Re: radio-sf16fmi cleanup"
Previous message: Andrew Morton: "Re: [patch 04/12] drivers/net/myri_code.h cleanup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]