Re: [PATCH 0/4] Swap migration V3: Overview

From: Andrew Morton
Date: Thu Oct 27 2005 - 22:09:28 EST

Next message: Joel Jaeggli: "Re: 4GB memory and Intel Dual-Core system"
Previous message: Herbert Xu: "Re: [patch 1/1] export cpu_online_map"
In reply to: Marcelo Tosatti: "Re: [PATCH 0/4] Swap migration V3: Overview"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Marcelo Tosatti <marcelo.tosatti@xxxxxxxxxxxx> wrote:
>
> Hi Andrew!
>
> On Thu, Oct 27, 2005 at 01:43:47PM -0700, Andrew Morton wrote:
> > Marcelo Tosatti <marcelo.tosatti@xxxxxxxxxxxx> wrote:
> > >
> > > The fair approach would be to have the
> > > number of pages to reclaim also relative to zone size.
> > >
> > > sc->nr_to_reclaim = (zone->present_pages * sc->swap_cluster_max) /
> > > total_memory;
> >
> > You can try it, but that shouldn't matter. SWAP_CLUSTER_MAX is just a
> > batching factor used to reduce CPU consumption. If you make it twice as
> > bug, we run DMA-zone reclaim half as often - it should balance out.
>
> But you're not taking the relationship between DMA and NORMAL zone
> into account?

We need to be careful to differentiate between page allocation and page
reclaim. In some ways they're coupled, but the VM does attempt to make one
independent from the other..

> I suppose that a side effect of such change is that more allocations
> will become serviced from the NORMAL/HIGHMEM zones ("more intensively
> reclaimed") while less allocations will become serviced by the DMA zone
> (whose scan/reclaim progress should now be _much_ lighter than that of
> the NORMAL zone). ie DMA zone will be much less often "available" for
> GFP_HIGHMEM/GFP_KERNEL allocations, which are the vast majority.

The use of SWAP_CLUSTER_MAX in the reclaim code shouldn't affect the
inter-zone balancing over in the allocation code. Much.

> Might be talking BS though.
>
> What else could explain this numbers from Magnus, taking into account
> that a large number of pages in the DMA zone are used for kernel text,
> etc. These unbalancing seems to be potentially suboptimal (and result
> in unpredictable behaviour depending from which zone pages becomes
> allocated from):
>
> "$ cat /proc/zoneinfo | grep present
> present 4096
> present 225280
> present 30342
>
> $ cat /proc/zoneinfo | grep tscanned
> tscanned 151352
> tscanned 3480599
> tscanned 541466
>
> "tscanned" counts how many pages that has been scanned in each zone
> since power on. Executive summary assuming that only LRU pages exist
> in the zone:
>
> DMA: each page has been scanned ~37 times
> Normal: each page has been scanned ~15 times
> HighMem: each page has been scanned ~18 times"

Yes, I've noticed that.

> I feel that I'm reaching the point where things should be confirmed
> instead of guessed (on my part!).

Need to check the numbers, but I expect you'll find that ZONE_DMA is
basically never used for either __GFP_HIGHMEM or GFP_KERNEL allocations,
due to the watermark thingies.

So it's basically just sitting there, being used by GFP_DMA allocations.
And IIRC there _are_ a batch of GFP_DMA allocations early in boot for
block-related stuff(?). It's all hazy ;)

But that would mean that most of the ZONE_DMA pages are used for
unreclaimable purposes, and only a small proportion of them are on the LRU.
That might cause the arithmetic to perform more scanning down there.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Joel Jaeggli: "Re: 4GB memory and Intel Dual-Core system"
Previous message: Herbert Xu: "Re: [patch 1/1] export cpu_online_map"
In reply to: Marcelo Tosatti: "Re: [PATCH 0/4] Swap migration V3: Overview"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]