Re: [patch 02/11 -mmotm] lowmemorykiller: Don't count free spaceunless it meets the specified limit by itself

From: Mel Gorman
Date: Wed May 13 2009 - 05:43:20 EST


On Tue, May 12, 2009 at 05:27:18PM -0700, Arve Hj?nnev?g wrote:
> On Tue, May 12, 2009 at 2:23 AM, Mel Gorman <mel@xxxxxxxxx> wrote:
> > On Sun, May 10, 2009 at 03:07:10PM -0700, David Rientjes wrote:
> >> From: Arve Hjønnevåg <arve@xxxxxxxxxxx>
> >>
> >> This allows processes to be killed when the kernel evict cache pages in
> >> an attempt to get more contiguous free memory.
> >>
> >
> > I'm not seeing what this patch has to do with contiguous free memory. I
> > see what the patch is doing - lowering the threshold allowing a
> > particular min_adj value to be used, but not why.
> >
>
> If the memory is fragmented, contiguous allocations can cause all
> caches to be freed. The low memory killer would not trigger in this
> case.
>

That still doesn't explain why this patch allows processes to be killed
in an effort to get contiguous free memory other than freeing pages in
general may free up contiguous pages. It seems this patch makes the
killer more agressive but I think I know why.

How about the following as a changelog? It might help me understand the
use case better and how this patch changes it if the changelog was better.
Of course, as this is this driver is isolated to the Android platform and
I'm not developing or own one, you're also welcome to tell me to "shut up,
you don't know what you're talking about" :)

=====
The Android low memory killer is responsible for ensuring there is enough
free memory at configured watermarks stored in a lowmem_minfree[] array. At
each watermark, there is an associated oom_adj score. When the watermark is
not met, processes with a higher oom_adj are considered for OOM killing
so that lower-priority processes are killed before the situation becomes
unmanageable.

The problem is that the driver currently sums NR_FREE_PAGES+NR_FILE_PAGES
as an estimate of how much memory is potentially free. This can lead to
a situation where no memory is really free because it's all in the page
cache and corrective action is taken too late with too many processes being
considered. This patch changes the semantics such that both NR_FREE_PAGES
and NR_FILE_PAGES must be above the lowmem_minfree threshold.
====

?

If this is accurate, it also wouldn't hurt to put the first paragraph into a
comment earlier in the driver so the next poor fool like me isn't scratching
his head wondering what this is for.

> >
> >> Signed-off-by: Arve Hjønnevåg <arve@xxxxxxxxxxx>
> >> ---
> >>  drivers/staging/android/lowmemorykiller.c |   13 +++++++++----
> >>  1 files changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
> >> --- a/drivers/staging/android/lowmemorykiller.c
> >> +++ b/drivers/staging/android/lowmemorykiller.c
> >> @@ -58,20 +58,25 @@ static int lowmem_shrink(int nr_to_scan, gfp_t gfp_mask)
> >>       int min_adj = OOM_ADJUST_MAX + 1;
> >>       int selected_tasksize = 0;
> >>       int array_size = ARRAY_SIZE(lowmem_adj);
> >> -     int other_free = global_page_state(NR_FREE_PAGES) + global_page_state(NR_FILE_PAGES);
> >> +     int other_free = global_page_state(NR_FREE_PAGES);
> >> +     int other_file = global_page_state(NR_FILE_PAGES);
> >>       if(lowmem_adj_size < array_size)
> >>               array_size = lowmem_adj_size;
> >>       if(lowmem_minfree_size < array_size)
> >>               array_size = lowmem_minfree_size;
> >>       for(i = 0; i < array_size; i++) {
> >
> > It would appear that lowmem_adj_size is making assumptions on
> > the number of zones that exist in the system. It's not clear why
> > sysctl_lowmem_reserve_ratio[] is not being used or how they differ.
>
> lowmem_adj_size is the number of elements in the array. Why would the
> number of zones in the system affect it?

I misread what the purpose of lowmem_adj[] was. I thought it was
thresholds on a per-zone basis because it looks similar to
lowmem_reserve_ratio[] but that's not what it is at all.

Looking at it closer, this is more about introducing more watermarks to
a zone for the control of the size of the cache - maybe because the OOM
killer takes action too late for you.

Lack of familiarity with the driver and how it's expected to be used is
catching me.

> The goal of the
> lowmemorykiller is to make more memory available for caches. I would
> expect increasing sysctl_lowmem_reserve_ratio to have the opposite
> effect, but on our system with only one zone it has no effect.
>

Ok.

> >
> >> -             if(other_free < lowmem_minfree[i]) {
> >> +             if (other_free < lowmem_minfree[i] &&
> >> +                 other_file < lowmem_minfree[i]) {
> >>                       min_adj = lowmem_adj[i];
> >>                       break;
> >>               }
> >>       }
> >>       if(nr_to_scan > 0)
> >> -             lowmem_print(3, "lowmem_shrink %d, %x, ofree %d, ma %d\n", nr_to_scan, gfp_mask, other_free, min_adj);
> >> -     rem = global_page_state(NR_ACTIVE) + global_page_state(NR_INACTIVE);
> >> +             lowmem_print(3, "lowmem_shrink %d, %x, ofree %d %d, ma %d\n", nr_to_scan, gfp_mask, other_free, other_file, min_adj);
> >> +     rem = global_page_state(NR_ACTIVE_ANON) +
> >> +             global_page_state(NR_ACTIVE_FILE) +
> >> +             global_page_state(NR_INACTIVE_ANON) +
> >> +             global_page_state(NR_INACTIVE_FILE);
> >
> > This looks like it's a compile fix since changes made to 4f98a2fe. This
> > should have been in a separate patch and prioritised.
>
> You are right. These patches were originally on 2.6.27 and I did not
> notice that the first patch was also broken when I fixed this patch
> for 2.6.29.
>

Ok

> >
> >>       if (nr_to_scan <= 0 || min_adj == OOM_ADJUST_MAX + 1) {
> >>               lowmem_print(5, "lowmem_shrink %d, %x, return %d\n", nr_to_scan, gfp_mask, rem);
> >>               return rem;
> >

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/