Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
From: Michal Hocko
Date: Thu Dec 07 2017 - 11:30:09 EST
On Fri 08-12-17 00:44:11, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > David, could you test with this patch please?
>
> Even if this patch solved David's case, you need to update
>
> * tsk_is_oom_victim() cannot be set from under us
> * either because current->mm is already set to NULL
> * under task_lock before calling mmput and oom_mm is
> * set not NULL by the OOM killer only if current->mm
> * is found not NULL while holding the task_lock.
>
> part as well, for it is the explanation of why
> tsk_is_oom_victim() test was expected to work.
Yes, the same applies for mm_is_oom_victim. I will fixup s@tsk_@mm_@
here.
> Also, do we need to do
>
> set_bit(MMF_OOM_SKIP, &mm->flags);
>
> if mm_is_oom_victim(mm) == false?
I do not think we really need to set MMF_OOM_SKIP if we are not going to
synchronize.
> exit_mmap() is called means that nobody can reach this mm
> except ->signal->oom_mm, and mm_is_oom_victim(mm) == false
> means that this mm cannot be reached by ->signal->oom_mm .
>
> Then, I think we do not need to set MMF_OOM_SKIP on this mm
> at exit_mmap() if mm_is_oom_victim(mm) == false.
yes. I will fold the following in if this turned out to really address
David's issue. But I suspect this will be the case considering the NULL
pmd in the report which would suggest racing with free_pgtable...
Thanks for the review!
---
diff --git a/mm/mmap.c b/mm/mmap.c
index d00a06248ef1..e63b7a576670 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3004,7 +3004,6 @@ void exit_mmap(struct mm_struct *mm)
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
- set_bit(MMF_OOM_SKIP, &mm->flags);
if (unlikely(mm_is_oom_victim(mm))) {
/*
* Wait for oom_reap_task() to stop working on this
@@ -3012,12 +3011,13 @@ void exit_mmap(struct mm_struct *mm)
* calling down_read(), oom_reap_task() will not run
* on this "mm" post up_write().
*
- * tsk_is_oom_victim() cannot be set from under us
+ * mm_is_oom_victim() cannot be set from under us
* either because current->mm is already set to NULL
* under task_lock before calling mmput and oom_mm is
* set not NULL by the OOM killer only if current->mm
* is found not NULL while holding the task_lock.
*/
+ set_bit(MMF_OOM_SKIP, &mm->flags);
down_write(&mm->mmap_sem);
up_write(&mm->mmap_sem);
}
--
Michal Hocko
SUSE Labs