Re: Re: [patch] mm, oom: prevent additional oom kills before memory is freed

From: Michal Hocko
Date: Fri Jun 16 2017 - 07:02:15 EST


On Fri 16-06-17 19:27:19, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 16-06-17 09:54:34, Tetsuo Handa wrote:
> > [...]
> > > And the patch you proposed is broken.
> >
> > Thanks for your testing!
> >
> > > ----------
> > > [ 161.846202] Out of memory: Kill process 6331 (a.out) score 999 or sacrifice child
> > > [ 161.850327] Killed process 6331 (a.out) total-vm:4172kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > > [ 161.858503] ------------[ cut here ]------------
> > > [ 161.861512] kernel BUG at mm/memory.c:1381!
> >
> > BUG_ON(addr >= end) suggests our vma has trimmed. I guess I see what is
> > going on here.
> > __oom_reap_task_mm exit_mmap
> > free_pgtables
> > up_write(mm->mmap_sem)
> > down_read_trylock(&mm->mmap_sem)
> > remove_vma
> > unmap_page_range
> >
> > So we need to extend the mmap_sem coverage. See the updated diff (not
> > the full proper patch yet).
>
> That diff is still wrong. We need to prevent __oom_reap_task_mm() from calling
> unmap_page_range() when __mmput() already called exit_mm(), by setting/checking
> MMF_OOM_SKIP like shown below.

Care to explain why?
[...]

> Since the OOM reaper does not reap hugepages, khugepaged_exit() part could be
> safe.

I think you are mixing hugetlb and THP pages here. khugepaged_exit is
about later and we do unmap those.

> But ksm_exit() part might interfere.

How?

> If it is guaranteed to be safe,
> what will go wrong if we move uprobe_clear_state()/exit_aio()/ksm_exit() etc.
> to just before mmdrop() (i.e. after setting MMF_OOM_SKIP) ?

I do not see why those matter and why they should be any special. Unless
I miss anything we really do only care about page table tear down and
the address space modification. They do none of that.

--
Michal Hocko
SUSE Labs