Re: [PATCH] exit: clear TIF_MEMDIE after exit_task_work

From: Michal Hocko
Date: Mon Feb 29 2016 - 13:44:51 EST


On Mon 29-02-16 19:21:31, Michal Hocko wrote:
> On Mon 29-02-16 20:02:09, Vladimir Davydov wrote:
> > An mm_struct may be pinned by a file. An example is vhost-net device
> > created by a qemu/kvm (see vhost_net_ioctl -> vhost_net_set_owner ->
> > vhost_dev_set_owner). If such process gets OOM-killed, the reference to
> > its mm_struct will only be released from exit_task_work -> ____fput ->
> > __fput -> vhost_net_release -> vhost_dev_cleanup, which is called after
> > exit_mmap, where TIF_MEMDIE is cleared. As a result, we can start
> > selecting the next victim before giving the last one a chance to free
> > its memory. In practice, this leads to killing several VMs along with
> > the fattest one.
>
> I am wondering why our PF_EXITING protection hasn't fired up.

OK, I guess I can see it. exit_mm has done tsk->mm = NULL and so we are
skipping over that task because oom_scan_process_thread hasn't checked
PF_EXITING. I will try to think about this some more tomorrow with a
fresh brain.
--
Michal Hocko
SUSE Labs