Re: [PATCH 0/4] Swap migration V3: Overview

From: Magnus Damm
Date: Wed Oct 26 2005 - 02:04:25 EST

Next message: Ravikiran G Thirumalai: "[rfc] x86_64: Kconfig changes for NUMA"
Previous message: Max Kellermann: "Re: 2.6.14-rc4-mm1 and later: second ata_piix controller is invisible"
In reply to: Marcelo Tosatti: "Re: [PATCH 0/4] Swap migration V3: Overview"
Next in thread: Marcelo Tosatti: "Re: [PATCH 0/4] Swap migration V3: Overview"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi again Marcelo,

On 10/25/05, Marcelo Tosatti <marcelo.tosatti@xxxxxxxxxxxx> wrote:
> On Tue, Oct 25, 2005 at 08:37:52PM +0900, Magnus Damm wrote:
> > DMA: each page has been scanned ~37 times
> > Normal: each page has been scanned ~15 times
> > HighMem: each page has been scanned ~18 times
> >
> > So if your user space page happens to be allocated from the DMA zone,
> > it looks like it is more probable that it will be paged out sooner
> > than if it was allocated from another zone. And this is on a half year
> > old P4 system.
>
> Well the higher relative pressure on a specific zone is a fact you have
> to live with.

Yes, and even if the DMA zone was removed we still would have the same
issue with highmem vs lowmem.

> Even with a global LRU you're going to suffer from the same issue once
> you've got different relative pressure on different zones.

Yep, the per-node LRU will not even out the pressure. But my main
concern is rather the side effect of the pressure difference than the
pressure difference itself.

The side effect is that the "wrong" pages may be paged out in a
per-zone LRU compared to a per-node LRU. This may or may not be a big
deal for performance.

> Thats the reason for the mechanisms which attempt to avoid allocating
> from the lower precious zones (lowmem_reserve and the allocation
> fallback logic).

Exactly. But will this logic always work well? With some memory
configurations the normal zone might be smaller than the DMA zone. And
the same applies for highmem vs normal zone. I'm not sure, but doesn't
the size of the zones somehow relate to the memory pressure?

> > > > There are probably not that many drivers using the DMA zone on a
> > > > modern PC, so instead of bringing performance penalty on the entire
> > > > system I think it would be nicer to punish the evil hardware instead.
> > >
> > > Agreed - the 16MB DMA zone is silly. Would love to see it go away...
> >
> > But is the DMA zone itself evil, or just that we have one LRU per zone...?
>
> I agree that per-zone LRU complicates global page aging (you simply don't have
> global aging).
>
> But how to deal with restricted allocation requirements otherwise?
> Scanning several GB's worth of pages looking for pages in a specific
> small range can't be very promising.

I'm not sure exactly how much of the buddy allocator design that
currently is used by the kernel, but I suspect that 99.9% of all
allocations are 0-order. So it probably makes sense to optimize for
such a case.

Maybe it is possible to scrap the zones and instead use:

0-order pages without restrictions (common case):
Free pages in the node are chained together and either kept on one
list (64 bit system or 32 bit system without highmem) or on two lists;
one for lowmem and one for highmem. Maybe per cpu lists should be used
on top of this too.

Other pages (>0-order, special requirements):
Each node has a bitmap where pages belonging to the node are
represented by one bit each. Each bit is used to determine if the
per-page status. A value of 0 means that the page is used/reserved,
and a 1 means that the page is either free or allocated somehow but it
is possible migrate or page out the data.

So a page marked as 1 may be on the 0-order list, in use on some LRU,
or maybe even migratable SLAB.

The functions in linux/bitmap.h or asm/bitops.h are then used to scan
through the bitmap to find contiguous pages within a certain range of
pages. This allows us to fulfill all sorts of funky requirements such
as alignment or "within N address bits".

The allocator should of course prefer free pages over "used but
migratable", but if no free pages exist to fulfill the requirement,
page migration is used to empty the contiguous range.

The drawback of the idea above is of course the overhead (both memory
and cpu) introduced by the bitmap. But the allocator above may be more
successful for N-order allocations than the buddy allocator since the
pages doesn't have to be aligned. The allocator will probably be even
more successful if page migration is used too.

And then you have a per-node LRU on top of the above. =)

> Hope to be useful comments.

Yes, very useful. Many thanks!

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ravikiran G Thirumalai: "[rfc] x86_64: Kconfig changes for NUMA"
Previous message: Max Kellermann: "Re: 2.6.14-rc4-mm1 and later: second ata_piix controller is invisible"
In reply to: Marcelo Tosatti: "Re: [PATCH 0/4] Swap migration V3: Overview"
Next in thread: Marcelo Tosatti: "Re: [PATCH 0/4] Swap migration V3: Overview"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]