Re: [PATCH] exit: clear TIF_MEMDIE after exit_task_work

From: Michal Hocko
Date: Tue Mar 01 2016 - 12:18:07 EST


On Tue 01-03-16 18:46:38, Michael S. Tsirkin wrote:
> On Tue, Mar 01, 2016 at 05:35:37PM +0100, Michal Hocko wrote:
> > On Tue 01-03-16 18:22:32, Michael S. Tsirkin wrote:
> > > On Tue, Mar 01, 2016 at 05:08:13PM +0100, Michal Hocko wrote:
> > > > On Tue 01-03-16 17:57:04, Michael S. Tsirkin wrote:
> > > > > On Tue, Mar 01, 2016 at 04:52:12PM +0100, Michal Hocko wrote:
> > > > > > [CCing vhost-net maintainer]
> > > > > >
> > > > > > On Mon 29-02-16 20:02:09, Vladimir Davydov wrote:
> > > > > > > An mm_struct may be pinned by a file. An example is vhost-net device
> > > > > > > created by a qemu/kvm (see vhost_net_ioctl -> vhost_net_set_owner ->
> > > > > > > vhost_dev_set_owner).
> > > > > >
> > > > > > The more I think about that the more I am wondering whether this is
> > > > > > actually OK and correct. Why does the driver have to pin the address
> > > > > > space? Nothing really prevents from parallel tearing down of the address
> > > > > > space anyway so the code cannot expect all the vmas to stay. Would it be
> > > > > > enough to pin the mm_struct only?
> > > > >
> > > > > I'll need to research this. It's a fact that as long as the
> > > > > device is not stopped, vhost can attempt to access
> > > > > the address space.
> > > >
> > > > But does it expect any specific parts of the address space to be mapped?
> > > > E.g. proc needs to keep the mm allocated as well for some files but it
> > > > doesn't pin the address space (mm_users) but rather mm_count (see
> > > > proc_mem_open).
> > >
> > > At a quick glance, it seems that it's needed: it calls
> > > get_user_pages(mm) and that looks like it will not DTRT (or even fail
> > > gracefully) if mm->mm_users == 0 and exit_mmap/etc was already called
> > > (or is in progress).
> >
> > yes it will fail gracefully
>
>
> What makes get_user_pages fail gracefully in this case,
> if it races with task exiting?

Sorry, I could have been more verbose... The code would have to make sure
that the mm is still alive before calling g-u-p by
atomic_inc_not_zero(&mm->mm_users) and fail if the user count dropped to
0 in the mean time. See how fs/proc/task_mmu.c does that (proc_mem_open
+ m_start + m_stop.

The biggest advanatage would be that the mm address space pin would be
only for the particular operation. Not sure whether that is possible in
the driver though. Anyway pinning the mm for a potentially unbounded
amount of time doesn't sound too nice.
--
Michal Hocko
SUSE Labs