Re: Should mmap MAP_LOCKED fail if mm_poppulate fails?

From: Linus Torvalds
Date: Tue Apr 28 2015 - 12:02:06 EST


On Tue, Apr 28, 2015 at 5:11 AM, Michal Hocko <mhocko@xxxxxxx> wrote:
>
> The first patch is dumb and straightforward. It should be safe as is and
> also good without the follow up 2 patches which try to handle potential
> allocation failures in the do_munmap path more gracefully. As we still
> do not fail small allocations even the first patch could be simplified
> a bit and the retry loop replaced by a BUG_ON right away.

I think the BUG_ON() is a bad idea in the first place, and is in fact
a good reason to ignore the patch series entirely.

What is the point of that BUG_ON()?

Hell, people add too many of those things. There is *no* excuse for
killing the kernel for things like this (and in certain setups,
BUG_ON() *will* cause the machine to be rebooted). None. It's
completely inexcusable.

Thinking like this must go. BUG_ON() is for things where our internal
data structures are so corrupted that we don't know what to do, and
there's no way to continue. Not for "I want to sprinkle these things
around and this should not happen".

I also think that the whole complex "do_munmap_nofail()" is broken to
begin with, along with the crazy "!fatal_signal_pending()" thing.
There is absolutely no excuse for any of this.

Your code is also fundamentally buggy in that it tries to do unmap()
after it has dropped all locks, and things went wrong. So you may nto
be unmapping some other threads data.

There is no way in hell any of these patches can ever be applied.

There's a reason we don't handle populate failures - it's exactly
because we've dropped the locks etc. After dropping the locks, we
*cannot* clean up any more, because there's no way to know whather
we're cleaning up the right thing. You'd have to hold the write lock
over the whole populate, which has serious problems of its own.

So NAK on this series. I think just documenting the man-page might be
better. I don't think MAP_LOCKED is sanely fixable.

We might improve on MAP_LOCKED by having a heuristic up-front
(*before* actually doing any mmap) to verify that it's *likely* that
it will work. So we could return ENOMEM early if it looks like the
user would hit the resource limits, for example. That wouldn't be any
guarantee (another process might eat up the resource limit anyway),
and in fact it might be overly eager to fail (maybe the
mmap(MAP_LOCKED ends up unmapping an older locked mapping and we deny
it too eagerly), but it would probably work well enough in practice.

That, together with a warning in the man-page about mmap(MAP_LOCKED)
not being able to return "I only locked part of the mapping", if you
want full error handling you need to do mmap()+mlock() and check the
two errors separately.

Hmm? But I really dislike your patch-series as-is.

Linus

Linus

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/