Re: mmap, the language go, problems with the linux kernel

From: Ted Ts'o
Date: Wed Feb 09 2011 - 14:18:07 EST


On Wed, Feb 09, 2011 at 05:30:19PM +0100, Martin Capitanio wrote:
> So, I hope I managed now to put all the involved on the cc list. Here
> are the relevant responses I've got from the other ml. I think
> there is still a confusion what the mmap syscall actually should
> do in the case of PROT_NONE (Data cannot be accessed)
> http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html

Actually, I don't think the confusion has anything to do with
PROT_NONE. The Go designers have themselves said that their intent
was to reserve the virtual address space. So that much is clear.

The real quesiton is what does RLIMIT_AS and ulimit -v supposed to
*do*. The Single Unix Specification (and POSIX, which is where this
comes from), is quite vague: "the maximum size of a process's total
available memory, in bytes". What in the world is "total available
memory"?!? BSD also has RLIMIT_RSS, which was not adopted by Posix
(not surprising, given that in the early days it was dominated by
System V folks).

AIX and the BSD's don't implement RLIMIT_AS at all. Solaris does, but
the man page just says "total available memory", again without
specifying what that means. Solaris also has a RLIMIT_VMEM, which is
the total amount of virtual address space, so apparently Solaris seems
to think that RLIMIT_VMEM and RLIMIT_AS are different things.

Linux has interpreted RLIMIT_AS to mean total amount of virtual
address space for a long, long time. (The interpretation AS ==
"address space" does make sense, although it's not clear that's what
the original definition of RLIMIT_AS was supposed to mean.) Linux
also has a RLIMIT_RSS, probably taken from BSD, which is not
implemented (although if you are using memory cgroups, you can
effectively get the same result as limiting a process's RSS, although
via different API).

Bash has definied rlimit -v to mean "total amount of virtual memory"
and implements it via RLIMIT_AS, so it's pretty clear that its intent
was that rlimit -v is supposed to mean "virtual address space". (Or
maybe it was documented that way and the letter 'v' chosen because
that's what RLIMIT_AS has meant on Linux for a long time.)

The bottom line is that so long as Go's memory management system is
intending to reserve virtual address space, there is no real conflict
in the question of what PROT_NONE means. Both Linux and Go intend it
to mean, "reserve address space". The better line of argumentation
from the Go perspective is that RLIMIT_AS shouldn't mean restricting
the virtual address space, but "something else". But that would mean
changing Linux's behavior, which has been established for many, many
years. And arguably the specification is vague at best. (What does
"available memory" mean, anyway? Does it mean physical memory?
physical memory plus whatever swap space happens to be available?
Does VM overcommit be taken into account --- what if every single page
in every single copy of the 'ftpd' binary gets attached by a debugger
and modified?)

Linux has interpreted it to mean "virtual address space", and in fact
it's documented as such in the its version of the getrlimit man page.
I'd have to agree with Linus that it's probably way too late to change
what it means (or what Linux thinks it means, anyway).

In any case, it's deployed on so many machines that any change would
take years to roll out anyway. What I'd probably recommend to Go
developers is to check the value of RLIMIT_AS via getrlimit(), and if
it's too small for what you want, print a human-readable error or
warning message telling the user to limit the RLIMIT_AS, and then
either stop, or use some alternate allocation strategy.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/