Re: [PATCH] mm: Use GFP_KERNEL allocation for the page cache in page_cache_read

From: Michal Hocko
Date: Mon Mar 30 2015 - 04:22:30 EST


On Fri 27-03-15 08:43:54, Dave Chinner wrote:
> On Thu, Mar 26, 2015 at 10:53:02AM +0100, Michal Hocko wrote:
> > On Fri 20-03-15 14:48:20, Dave Chinner wrote:
> > > On Thu, Mar 19, 2015 at 01:44:41PM +0100, Michal Hocko wrote:
> > [...]
> > > > Or did I miss your point? Are you concerned about some fs overloading
> > > > filemap_fault and do some locking before delegating to filemap_fault?
> > >
> > > The latter:
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/dgc/linux-xfs.git/commit/?h=xfs-mmap-lock&id=de0e8c20ba3a65b0f15040aabbefdc1999876e6b
> >
> > Hmm. I am completely unfamiliar with the xfs code but my reading of
> > 964aa8d9e4d3..723cac484733 is that the newly introduced lock should be
> > OK from the reclaim recursion POV. It protects against truncate and
> > punch hole, right? Or are there any internal paths which I am missing
> > and would cause problems if we do GFP_FS with XFS_MMAPLOCK_SHARED held?
>
> It might be OK, but you're only looking at the example I gave you,
> not the fundamental issue it demonstrates. That is: filesystems may
> have *internal dependencies that are unknown to the page cache or mm
> subsystem*. Hence the page cache or mm allocations cannot
> arbitrarily ignore allocation constraints the filesystem assigns to
> mapping operations....

I fully understand that. I am just trying to understand what are the
real requirements from filesystems wrt filemap_fault. mapping gfp mask is
not usable much for that because e.g. xfs has GFP_NOFS set for the whole
inode life AFAICS. And it seems that this context is not really required
even after the recent code changes.
We can add gfp_mask into struct vm_fault and initialize it to
mapping_gfp_mask | GFP_IOFS and .fault() callback might overwrite it. This
would be cleaner than unconditional gfp manipulation (the patch below).

But we are in a really hard position if the GFP_NOFS context is really
required here. We shouldn't really trigger OOM killer because that could
be premature and way too disruptive. We can retry the page fault or the
allocation but that both sound suboptimal to me.

Do you have any other suggestions?

This hasn't been tested yet it just shows the idea mentioned above.
---