Re: [RFC][PATCH 0/4] Memory controller soft limit patches

From: Balbir Singh
Date: Wed Jan 07 2009 - 23:00:04 EST


* KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> [2009-01-08 09:30:40]:

> On Thu, 08 Jan 2009 00:11:10 +0530
> Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
>
> >
> > Here is v1 of the new soft limit implementation. Soft limits is a new feature
> > for the memory resource controller, something similar has existed in the
> > group scheduler in the form of shares. We'll compare shares and soft limits
> > below. I've had soft limit implementations earlier, but I've discarded those
> > approaches in favour of this one.
> >
> > Soft limits are the most useful feature to have for environments where
> > the administrator wants to overcommit the system, such that only on memory
> > contention do the limits become active. The current soft limits implementation
> > provides a soft_limit_in_bytes interface for the memory controller and not
> > for memory+swap controller. The implementation maintains an RB-Tree of groups
> > that exceed their soft limit and starts reclaiming from the group that
> > exceeds this limit by the maximum amount.
> >
> > This is an RFC implementation and is not meant for inclusion
> >
> Core implemantation seems simple and the feature sounds good.

Thanks!

> But, before reviewing into details, 3 points.
>
> 1. please fix current bugs on hierarchy management, before new feature.
> AFAIK, OOM-Kill under hierarchy is broken. (I have patches but waits for
> merge window close.)

I've not hit the OOM-kill issue under hierarchy so far, is the OOM
killer selecting a bad task to kill? I'll debug/reproduce the issue.
I am not posting these patches for inclusion, fixing bugs is
definitely the highest priority.

> I wonder there will be some others. Lockdep error which Nishimura reported
> are all fixed now ?

I run all my kernels and tests with lockdep enabled, I did not see any
lockdep errors showing up.

>
> 2. You inserts reclaim-by-soft-limit into alloc_pages(). But, to do this,
> you have to pass zonelist to try_to_free_mem_cgroup_pages() and have to modify
> try_to_free_mem_cgroup_pages().
> 2-a) If not, when the memory request is for gfp_mask==GFP_DMA or allocation
> is under a cpuset, memory reclaim will not work correctlly.

The idea behind adding the code in alloc_pages() is to detect
contention and trim mem cgroups down, if they have grown beyond their
soft limit

> 2-b) try_to_free_mem_cgroup_pages() cannot do good work for order > 1 allocation.
>
> Please try fake-numa (or real NUMA machine) and cpuset.

Yes, order > 1 is documented in the patch and you can see the code as
well. Your suggestion is to look at the gfp_mask as well, I'll do
that.

>
> 3. If you want to insert hooks to "generic" page allocator, it's better to add CC to
> Rik van Riel, Kosaki Motohiro, at leaset.

Sure, I'll do that in the next patchset.

>
> To be honest, I myself don't like to add a hook to alloc_pages() directly.
> Can we implment call soft-limit like kswapd (or on kswapd()) ?
> i.e. in moderate way ?
>

Yes, that might be another point to experiment with, I'll try that in
the next iteration.


> A happy new year,
>

A very happy new year to you as well.

> -Kame
>

--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/