Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

From: Dave Chinner
Date: Fri Apr 29 2016 - 19:40:18 EST


On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote:
> On Fri 29-04-16 07:51:45, Dave Chinner wrote:
> > On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> > > [Trim the CC list]
> > > On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > > [...]
> > > > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > > > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> > > > lockdep gets very unhappy about the same functions being called with
> > > > different reclaim contexts. e.g. directory block mapping might
> > > > occur from readdir (no transaction context) or within transactions
> > > > (create/unlink). hence paths like this are tagged with GFP_NOFS to
> > > > stop lockdep emitting false positive warnings....
> > >
> > > As already said in other email, I have tried to revert the above
> > > commit and tried to run it with some fs workloads but didn't manage
> > > to hit any lockdep splats (after I fixed my bug in the patch 1.2). I
> > > have tried to find reports which led to this commit but didn't succeed
> > > much. Everything is from much earlier or later. Do you happen to
> > > remember which loads triggered them, what they looked like or have an
> > > idea what to try to reproduce them? So far I was trying heavy parallel
> > > fs_mark, kernbench inside a tiny virtual machine so any of those have
> > > triggered direct reclaim all the time.
> >
> > Most of those issues were reported by users and not reproducable by
> > any obvious means.
>
> I would really appreciate a reference to some of those (my google-fu has
> failed me) or at least a pattern of those splats

If you can't find them with google, then I won't. Google is mostly
useless as a patch/mailing list search tool these days. You can try
looking through this list:

https://www.google.com.au/search?q=XFS+lockdep+site:oss.sgi.com+-splice

but I'm not seeing anything particularly relevant in that list -
there isn't a single reclaim related lockdep report in that...

> - was it
> "inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[WR]} usage"
> or a different class reports?

Typically that was involved, but it quite often there'd be a number
of locks and sometimes even interrupt stacks in an interaction
between 5 or 6 different processes. Lockdep covers all sorts of
stuff now (like fs freeze annotations as well as locks and memory
reclaim) so sometimes the only thing we can do is remove the
reclaim context from the stack and see if that makes it go away...
>
> > They may have been fixed since, but I'm sceptical
> > of that because, generally speaking, developer testing only catches
> > the obvious lockdep issues. i.e. it's users that report all the
> > really twisty issues, and they are generally not reproducable except
> > under their production workloads...
> >
> > IOWs, the absence of reports in your testing does not mean there
> > isn't a problem, and that is one of the biggest problems with
> > lockdep annotations - we have no way of ever knowing if they are
> > still necessary or not without exposing users to regressions and
> > potential deadlocks.....
>
> I understand your points here but if we are sure that those lockdep
> reports are just false positives then we should rather provide an api to
> silence lockdep for those paths

I agree with this - please provide such infrastructure before we
need it...

> than abusing GFP_NOFS which a) hurts
> the overal reclaim healthiness

Which doesn't actually seem to be a problem for the vast majority of
users.

> and b) works around a non-existing
> problem with lockdep disabled which is the vast majority of
> configurations.

But the moment we have a lockdep problem, we get bug reports from
all over the place and people complaining about it, so we are
*required* to silence them one way or another. And, like I said,
when the choice is simply adding GFP_NOFS or spending a week or two
completely reworking complex code that has functioned correctly for
15 years, the risk/reward *always* falls on the side of "just add
GFP_NOFS".

Please keep in mind that there is as much code in fs/xfs as there is
in the mm/ subsystem, and XFS has twice that in userspace as well.
I say this, because we have only have 3-4 full time developers to do
all the work required on this code base, unlike the mm/ subsystem
which had 30-40 full time MM developers attending LSFMM. This is why
I push back on suggestions that require significant redesign of
subsystem code to handle memory allocation/reclaim quirks - most
subsystems simply don't have the resources available to do such
work, and so will always look for the quick 2 minute fix when it is
available....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx