Re: OOPS in perf_mmap_close()

From: Peter Zijlstra
Date: Thu May 23 2013 - 15:24:47 EST


On Thu, May 23, 2013 at 05:59:10PM +0000, Christoph Lameter wrote:
> On Thu, 23 May 2013, Peter Zijlstra wrote:
>
> > I know all that, and its completely irrelevant to the discussion.
>
> What you said in the rest of the email seems to indicate that you
> still do not know that and I am repeating what I have said before here.

I'm very much aware of the difference between locked and pinned pages;
and I don't object to accounting them separately per se. I do however
object to making a joke out of resource limits and silently changing
behaviour.

Your changelog (that you seem to be so proud of that you want me to read
it yet again) completely fails to mention what happens to resource
limits.

> > You now have double the amount of memory you can loose, once to actual
> > mlock() and once through whatever generates pinned -- if it bothers with
> > checking limits at all.
>
> It was already doubled which was the reason for the patch. The patch
> avoided the doubling that we saw and it allowed to distinguish between
> mlocked and pinned pages.

So it was double and now its still double, so your patch fixed what
exactly?

> > Where we had the guarantee that x < y; you did x := x1 + x2; which then
> > should result in: x1 + x2 < y, instead you did: x1 < y && x2 < y, not
> > the same and completely wrong.
>
> We never had that guarantee.

Rlimits says we have; if it was buggy we should've fixed it, not made it
worse. What's the point of pretending we have RLIMIT_MEMLOCK if we then
feel free to ignore it?

> We were accounting many pages twice in
> the same counter. Once for mlocking and once for pinning. Thus the problem
> that the patch addresses. Read the changelog?

Splitting the counter doesn't magically fix it. Who was accounting
double and why not fix those?

Splitting things doesn't fix anything; the sum is still counting double
and you've made resource limits more complex.

We should make mlock skip over the special pinned vmas instead of
pretending pinned pages shouldn't be part of resource limits.

> There are other sources that cause pages to be not evictable (like f.e.
> dirtying). Mlock accounting is not accurate in any case. The mlocked page
> limit is per thread

Its per process; see how locked_vm and pinned_vm are part of mm_struct
and modified under mmap_sem.

> which is another issue and so is the vm_pinned
> counter. The pages actually may be shared between many processes and the
> ownership of those pages is not clear. The accounting for mlock and
> pinning also is a bit problematic as a result.

I maintain that you cannot simply split a resource counter without
properly explaining what happens to resource limits.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/