Re: [PATCH] mm: drop mmap_sem before calling balance_dirty_pages() in write fault

From: Kirill A. Shutemov
Date: Thu Sep 26 2019 - 09:49:26 EST


On Tue, Sep 24, 2019 at 05:43:37PM -0400, Johannes Weiner wrote:
> On Tue, Sep 24, 2019 at 01:46:08PM -0700, Matthew Wilcox wrote:
> > On Tue, Sep 24, 2019 at 03:42:38PM -0400, Johannes Weiner wrote:
> > > > I'm not a fan of moving file_update_time() to _before_ the
> > > > balance_dirty_pages call.
> > >
> > > Can you elaborate why? If the filesystem has a page_mkwrite op, it
> > > will have already called file_update_time() before this function is
> > > entered. If anything, this change makes the sequence more consistent.
> >
> > Oh, that makes sense. I thought it should be updated after all the data
> > was written, but it probably doesn't make much difference.
> >
> > > > Also, this is now the third place that needs
> > > > maybe_unlock_mmap_for_io, see
> > > > https://lore.kernel.org/linux-mm/20190917120852.x6x3aypwvh573kfa@box/
> > >
> > > Good idea, I moved the helper to internal.h and converted to it.
> > >
> > > I left the shmem site alone, though. It doesn't require the file
> > > pinning, so it shouldn't pointlessly bump the file refcount and
> > > suggest such a dependency - that could cost somebody later quite a bit
> > > of time trying to understand the code.
> >
> > The problem for shmem is this:
> >
> > spin_unlock(&inode->i_lock);
> > schedule();
> >
> > spin_lock(&inode->i_lock);
> > finish_wait(shmem_falloc_waitq, &shmem_fault_wait);
> > spin_unlock(&inode->i_lock);
> >
> > While scheduled, the VMA can go away and the inode be reclaimed, making
> > this a use-after-free. The initial suggestion was an increment on
> > the inode refcount, but since we already have a pattern which involves
> > pinning the file, I thought that was a better way to go.
>
> I completely read over the context of that email you linked - that
> there is a bug in the existing code - and looked at it as mere
> refactoring patch. My apologies.
>
> Switching that shmem site to maybe_unlock_mmap_for_io() to indirectly
> pin the inode (in a separate bug fix patch) indeed makes sense to me.

The patch on top of this one is below. Please post them together if you are
going to resend yours.
>
> > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> >
> > Reviewed-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>

Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>