Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

From: Michal Hocko
Date: Thu Aug 10 2017 - 14:51:48 EST


On Thu 10-08-17 20:05:54, Andrea Arcangeli wrote:
> On Thu, Aug 10, 2017 at 10:16:32AM +0200, Michal Hocko wrote:
> > Andrea has proposed and alternative solution [4] which should be
> > equivalent functionally similar to {ksm,khugepaged}_exit. I have to
> > confess I really don't like that approach but I can live with it if
> > that is a preferred way (to be honest I would like to drop the empty
>
> Well you added two branches, when only one is necessary. It's more or
> less like preferring a rwsem when a mutex is enough, because you're
> more used to use rwsems.
>
> > down_write();up_write() from the other two callers as well). In fact I
> > have asked Andrea to post his patch [5] but that hasn't happened. I do
> > not think we should wait much longer and finally merge some fix.
>
> It's posted in [4] already below I didn't think it was necessary to
> resend it.

it was deep in the email thread and I've asked you explicitly to repost
which I've done for the same reason.

> The only other improvement I can think of is an unlikely
> around tsk_is_oom_victim() in exit_mmap, but your patch below would
> need it too, and two of them.

with
diff --git a/mm/mmap.c b/mm/mmap.c
index 822e8860b9d2..9d4a5a488f72 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3002,7 +3002,7 @@ void exit_mmap(struct mm_struct *mm)
* with tsk->mm != NULL checked on !current tasks which synchronizes
* with exit_mm and so we cannot race here.
*/
- if (tsk_is_oom_victim(current)) {
+ if (unlikely(tsk_is_oom_victim(current))) {
down_write(&mm->mmap_sem);
locked = true;
}
@@ -3020,7 +3020,7 @@ void exit_mmap(struct mm_struct *mm)
}
mm->mmap = NULL;
vm_unacct_memory(nr_accounted);
- if (locked)
+ if (unlikely(locked))
up_write(&mm->mmap_sem);
}

The generated code is identical. But I do not have any objection of
course.

> > [1] http://lkml.kernel.org/r/20170724072332.31903-1-mhocko@xxxxxxxxxx
> > [2] http://lkml.kernel.org/r/20170725142626.GJ26723@xxxxxxxxxxxxxx
> > [3] http://lkml.kernel.org/r/20170725160359.GO26723@xxxxxxxxxxxxxx
> > [4] http://lkml.kernel.org/r/20170726162912.GA29716@xxxxxxxxxx
> > [5] http://lkml.kernel.org/r/20170728062345.GA2274@xxxxxxxxxxxxxx
> >
> > + if (tsk_is_oom_victim(current)) {
> > + down_write(&mm->mmap_sem);
> > + locked = true;
> > + }
> > free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> > tlb_finish_mmu(&tlb, 0, -1);
> >
> > @@ -3005,7 +3018,10 @@ void exit_mmap(struct mm_struct *mm)
> > nr_accounted += vma_pages(vma);
> > vma = remove_vma(vma);
> > }
> > + mm->mmap = NULL;
> > vm_unacct_memory(nr_accounted);
> > + if (locked)
> > + up_write(&mm->mmap_sem);
>
> I wouldn't normally repost to add an unlikely when I'm not sure if it
> gets merged, but if it gets merged I would immediately tell to Andrew
> about the microoptimization being missing there so he can fold it
> later.
>
> Before reposting about the unlikely I thought we should agree which
> version to merge: [4] or the above double branch (for no good as far
> as I tangibly can tell).
>
> I think down_write;up_write is the correct thing to do here because
> holding the lock for any additional instruction has zero benefits, and
> if it has zero benefits it only adds up confusion and makes the code
> partly senseless, and that ultimately hurts the reader when it tries
> to understand why you're holding the lock for so long when it's not
> needed.

OK, let's agree to disagree. As I've said I like when the critical
section is explicit and we _know_ what it protects. In this case it is
clear that we have to protect from the page tables tear down and the
vma destructions. But as I've said I am not going to argue about this
more. It is more important to finally fix this.
--
Michal Hocko
SUSE Labs