Re: PG_zero

From: Andrea Arcangeli
Date: Tue Nov 02 2004 - 21:07:38 EST


On Wed, Nov 03, 2004 at 12:23:12PM +1100, Nick Piggin wrote:
> I see what you mean. You could be correct that it would model cache
> behaviour better to just have the last N freed "hot" pages in LIFO
> order on the list, and allocate cold pages from the other end of it.

I believe this is the more efficient way to use the per-cpu resources to
still provide hot/cold behaviour, we should verify it though.

Plus I provide hot-cold info to the zerolist too the same way.

with the current model I'd need 4 per-cpu lists.

> You still don't want cold freeing to pollute this list, *but* you do
> want to still batch up cold freeing to amortise the buddy's lock
> aquisition.
>
> You could do that with just one list, if you gave cold pages a small
> extra allowance to batch freeing if the list is full.

Once the list is full, I free a "batch" from the other end of it. And
then I put the new page into the list.

I agree that if I've to fallback into the buddy allocator to free pages
because the list is full, then if it's a __GFP_COLD freeing, I should
put the cold page into the buddy first. While I'm putting the cold page
into the list instead. So I'm basically off-by one potentially hot page.

I doubt it makes any difference (this only matters when the list is
completely full of only hot pages [unlikely since the list size is
bigger than l2 size and cache isn't only allocated on the freed pages,
but also on the still allocated pages]), but I think I can easily change
it to put the page at the end first, and then to batch the freeing. This
will fix this minor performance issue in my code (theoretically it's
more efficient the way you suggested ;). It's more an implementation bug
than anything I did intentionally ;).

> I think you want to still take them off the cold end. Taking a

yes, there is no doubt I want to take cold pages from the other end. But
if no improvement can be measured by picking always hot pages, then of
course picking them from the cold end isn't going to payoff much... ;)

> really cache hot page and having it invalidated is worse than
> having some cachelines out of your combined pool of hot pages
> pushed out when you heat the cold page.

how much worse? I do understand the snooping has a cost, but that dma is
pretty slow anyways.

Or you mean it's because the hot pages won't be guaranteed to be
completely cold but just a few cacheline cold? I prefer the next hot page to
be guaranteed to be still hot even after dma completes than to avoid
snooping and clobber the cache randomly.

If you invalidate the cache of an hot page, you're guaranteed you won't
affect the next hot page, the next hot page is guaranteed to be still
hot even after dma has completed.

I'm not sure what's more important but I don't feel reserving is right.

With the PG_zero-2-no-zerolist-reserve-1 I went as far as un-reserving
zero pages during non-zero allocations. despite they're in different
lists ;). I need them in different lists to provide PG_zero efficiently
and to track hot-cold info into the zerolist too. But I stopped all
reservations even if they're already zero. This way the 200% boost in
the microbench isn't guaranteed anymore but it very often happens.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/