Re: [PATCH 2/3] xfs: add kmem_alloc_io()

From: Dave Chinner
Date: Thu Aug 22 2019 - 09:46:17 EST


On Thu, Aug 22, 2019 at 02:19:04PM +0200, Vlastimil Babka wrote:
> On 8/22/19 2:07 PM, Dave Chinner wrote:
> > On Thu, Aug 22, 2019 at 01:14:30PM +0200, Vlastimil Babka wrote:
> >
> > No, the problem is this (using kmalloc as a general term for
> > allocation, whether it be kmalloc, kmem_cache_alloc, alloc_page, etc)
> >
> > some random kernel code
> > kmalloc(GFP_KERNEL)
> > reclaim
> > PF_MEMALLOC
> > shrink_slab
> > xfs_inode_shrink
> > XFS_ILOCK
> > xfs_buf_allocate_memory()
> > kmalloc(GFP_KERNEL)
> >
> > And so locks on inodes in reclaim are seen below reclaim. Then
> > somewhere else we have:
> >
> > some high level read-only xfs code like readdir
> > XFS_ILOCK
> > xfs_buf_allocate_memory()
> > kmalloc(GFP_KERNEL)
> > reclaim
> >
> > And this one throws false positive lockdep warnings because we
> > called into reclaim with XFS_ILOCK held and GFP_KERNEL alloc
>
> OK, and what exactly makes this positive a false one? Why can't it continue like
> the first example where reclaim leads to another XFS_ILOCK, thus deadlock?

Because above reclaim we only have operations being done on
referenced inodes, and below reclaim we only have unreferenced
inodes. We never lock the same inode both above and below reclaim
at the same time.

IOWs, an operation above reclaim cannot see, access or lock
unreferenced inodes, except in inode write clustering, and that uses
trylocks so cannot deadlock with reclaim.

An operation below reclaim cannot see, access or lock referenced
inodes except during inode write clustering, and that uses trylocks
so cannot deadlock with code above reclaim.

FWIW, I'm trying to make the inode writeback clustering go away from
reclaim at the moment, so even that possibility is going away soon.
That will change everything to trylocks in reclaim context, so
lockdep is going to stop tracking it entirely.

Hmmm - maybe we're getting to the point where we actually
don't need GFP_NOFS/PF_MEMALLOC_NOFS at all in XFS anymore.....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx