Re: [PATCH v2] xfs: remove kmem_zalloc_greedy

From: Luis R. Rodriguez
Date: Wed Mar 15 2017 - 11:44:56 EST


On Wed, Mar 15, 2017 at 09:35:29AM +0100, Michal Hocko wrote:
> On Wed 15-03-17 01:14:27, Luis R. Rodriguez wrote:
> > On Tue, Mar 14, 2017 at 11:07:38AM -0700, Darrick J. Wong wrote:
> > > On Tue, Mar 14, 2017 at 05:57:45PM +0100, Luis R. Rodriguez wrote:
> > > > On Tue, Mar 07, 2017 at 04:35:28PM -0800, Darrick J. Wong wrote:
> > > > > The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
> > > > > it to grab 1-4 pages for staging of inobt records. The infinite loop in
> > > > > the greedy allocation function is causing hangs[1] in generic/269, so
> > > > > just get rid of the greedy allocator in favor of kmem_zalloc_large.
> > > > > This makes bulkstat somewhat more likely to ENOMEM if there's really no
> > > > > pages to spare, but eliminates a source of hangs.
> > > > >
> > > > > [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.com
> > > > >
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > > ---
> > > > > v2: remove single-page fallback
> > > > > ---
> > > >
> > > > Since this fixes a hang how about *at the very least* a respective Fixes tag ?
> > > > This fixes an existing hang so what are the stable considerations here ? I
> > > > realize the answer is not easy but figured its worth asking.
> > >
> > > I didn't think it was appropriate to "Fixes: 77e4635ae1917" since we're
> > > not fixing _greedy so much as we are killing it. The patch fixes an
> > > infinite retry hang when bulkstat tries a memory allocation that cannot
> > > be satisfied; and having done that, realizes there are no remaining
> > > callers of _greedy and garbage collects it. The code that was there
> > > before also seems capable of sleeping forever, I think.
> > >
> > > So the minimally invasive fix is to apply the allocation conversion in
> > > bulkstat, and if there aren't any other callers of _greedy then you can
> > > get rid of it too.
> >
> > For the stake of stable XFS users then why not do the less invasive change
> > first, Cc stable, and then move on to the less backward portable solution ?
>
> The thing is that the permanent failures for vmalloc were so unlikely
> prior to 5d17a73a2ebe ("vmalloc: back off when the current task is
> killed") that this was basically a non-issue before this (4.11) merge
> window.

I see, this seems like critical information to add to the commit log.
Also, will this be at least pushed to v4.11 ?

Luis