Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers

From: Christian KÃnig
Date: Fri Aug 24 2018 - 08:53:58 EST


Am 24.08.2018 um 14:33 schrieb Michal Hocko:
On Fri 24-08-18 14:18:44, Christian KÃnig wrote:
Am 24.08.2018 um 14:03 schrieb Michal Hocko:
On Fri 24-08-18 13:57:52, Christian KÃnig wrote:
Am 24.08.2018 um 13:52 schrieb Michal Hocko:
On Fri 24-08-18 13:43:16, Christian KÃnig wrote:
[...]
That won't work like this there might be multiple
invalidate_range_start()/invalidate_range_end() pairs open at the same time.
E.g. the lock might be taken recursively and that is illegal for a
rw_semaphore.
I am not sure I follow. Are you saying that one invalidate_range might
trigger another one from the same path?
No, but what can happen is:

invalidate_range_start(A,B);
invalidate_range_start(C,D);
...
invalidate_range_end(C,D);
invalidate_range_end(A,B);

Grabbing the read lock twice would be illegal in this case.
I am sorry but I still do not follow. What is the context the two are
called from?
I don't have the slightest idea.

Can you give me an example. I simply do not see it in the
code, mostly because I am not familiar with it.
I'm neither.

We stumbled over that by pure observation and after discussing the problem
with Jerome came up with this solution.

No idea where exactly that case comes from, but I can confirm that it indeed
happens.
Thiking about it some more, I can imagine that a notifier callback which
performs an allocation might trigger a memory reclaim and that in turn
might trigger a notifier to be invoked and recurse. But notifier
shouldn't really allocate memory. They are called from deep MM code
paths and this would be extremely deadlock prone. Maybe Jerome can come
up some more realistic scenario. If not then I would propose to simplify
the locking here. We have lockdep to catch self deadlocks and it is
always better to handle a specific issue rather than having a code
without a clear indication how it can recurse.

Well I agree that we should probably fix that, but I have some concerns to remove the existing workaround.

See we added that to get rid of a real problem in a customer environment and I don't want to that to show up again.

In the meantime I've send out a fix to avoid allocating memory while holding the mn_lock.

Thanks for pointing that out,
Christian.