Re: What can we do to get ready for memory controller merge in 2.6.25

From: Balbir Singh
Date: Sat Dec 01 2007 - 04:50:49 EST


Paul Menage wrote:
> On Nov 29, 2007 6:11 PM, Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
>> And also some
>> results or even anecdotes of where this is going to be used would be
>> interesting...
>
> We want to be able to run multiple isolated jobs on the same machine.
> So being able to limit how much memory each job can consume, in terms
> of anonymous memory and page cache, are useful. I've not had much time
> to look at the patches in great detail, but they seem to provide a
> sensible way to assign and enforce static limits on a bunch of jobs.
>
> Some of our requirements are a bit beyond this, though:
>
> In our experience, users are not good at figuring out how much memory
> they really need. In general they tend to massively over-estimate
> their requirements. So we want some way to determine how much of its
> allocated memory a job is actively using, and how much could be thrown
> away or swapped out without bothering the job too much.
>

One would prefer the kernel provides the mechanism and user space
provides the policy. The algorithms to assign limits can exist in user
space and be supported by a good set of statistics.

> Of course, the definition of "actve use" is tricky - one possibility
> that we're looking at is "has been accessed within the last N
> seconds", where N can be configured appropriately for different jobs
> depending on the job's latency requirements. Active use should also be
> reported for pages that can't be easily freed quickly, e.g. mlocked or
> dirty pages, or anon pages on a swapless system. Inactive pages should
> be easily freeable, and be the first ones to go in the event of memory
> pressure. (From a scheduling point of view we can treat them as free
> memory, and schedule more jobs on the machine)
>

This definition of active comes from the mainline kernel, which in-turn
is derived from our understanding of the working set.

> The existing active/inactive distinction doesn't really capture this,
> since it's relative rather than absolute.
>

Not sure I understand why we need absolute use and not relative use.

> We want to be able to overcommit a machine, so the sums of the cgroup
> memory limits can add up to more than the total machine memory. So we
> need control over what happens when there's global memory pressure,
> and a way to ensure that the low-latency jobs don't get bogged down in
> reclaim (or OOM) due to the activity of batch jobs.
>

I agree, well said. We need Job Isolation.

> Paul


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/