Re: [RFC][0/3] Virtual address space control for cgroups (v2)

From: Paul Menage
Date: Thu Mar 27 2008 - 14:45:31 EST


On Thu, Mar 27, 2008 at 10:50 AM, Balbir Singh
<balbir@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > I was thinking more about that, and I think I found a possibly fatal flaw:
> >
>
> What is the critical flaw?
>

Oops, after I'd written that I decided while describing it that maybe
it wasn't that fatal after all, just fiddly, and so deleted the
description but forgot to delete the preceding sentence. :-)

There were a couple of issues. The first was that if the new owner is
in a different cgroup, we might have to fix up the address space
charges when we pass off the ownership, which would be a bit of a
layer violation but maybe manageable.

The other was to do with ensuring that mm->owner remains valid until
after exit_mmap() has been called (so the va limit controller can
deduct from the va usage).
>
> Yes, I've seen some patches there as well. As far as sparse virtual addresses
> are concerned, I find it hard to understand why applications would use sparse
> physical memory and large virtual addresses. Please see my comment on overcommit
> below.

Java (or at least, Sun's JRE) is an example of a common application
that does this. It creates a huge heap mapping at startup, and faults
it in as necessary.

> > But the problem that I have with this is that mmap() is only very
> > loosely connected with physical memory. If we're trying to help
> > applications avoid swapping, and giving them advance warning that
> > they're running out of physical memory, then we should do exactly
> > that, not try to treat address space as a proxy for physical memory.
>
> Consider why we have the overcommit feature in the Linux kernel. Virtual memory
> limits (decided by the administrator) help us prevent from excessively over
> committing the system.

Well if I don't believe in per-container virtual address space limits,
I'm unlikely to be a big fan of system-wide virtual address space
limits either. So running with vm.overcommit_memory=2 is right out ...

I'm certainly not disputing that it's possible to avoid excessive
overcommit by using virtual address space limits.

It's just for that both of the real-world large-scale production
systems I've worked with (a virtual server system for ISPs, and
Google's production datacenters) there were enough cases of apps/jobs
that used far more virtual address space than actual physical memory
that picking a virtual address space ratio/limit that would be useful
for preventing dangerous overcommit while not breaking lots of apps
would be pretty much impossible to do automatically. And specifying
them manually requires either unusually clueful users (most of whom
have enough trouble figuring out how much physical memory they'll
need, and would just set very high virtual address space limits) or
sysadmins with way too much time on their hands ...

As I said, I think focussing on ways to tell apps that they're running
low on physical memory would be much more productive.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/