Re: vmalloc with GFP_NOFS

From: Mikulas Patocka
Date: Tue Apr 24 2018 - 13:05:34 EST




On Tue, 24 Apr 2018, Michal Hocko wrote:

> On Tue 24-04-18 12:46:55, Mikulas Patocka wrote:
> >
> >
> > On Tue, 24 Apr 2018, Michal Hocko wrote:
> >
> > > Hi,
> > > it seems that we still have few vmalloc users who perform GFP_NOFS
> > > allocation:
> > > drivers/mtd/ubi/io.c
> > > fs/ext4/xattr.c
> > > fs/gfs2/dir.c
> > > fs/gfs2/quota.c
> > > fs/nfs/blocklayout/extent_tree.c
> > > fs/ubifs/debug.c
> > > fs/ubifs/lprops.c
> > > fs/ubifs/lpt_commit.c
> > > fs/ubifs/orphan.c
> > >
> > > Unfortunatelly vmalloc doesn't suppoer GFP_NOFS semantinc properly
> > > because we do have hardocded GFP_KERNEL allocations deep inside the
> > > vmalloc layers. That means that if GFP_NOFS really protects from
> > > recursion into the fs deadlocks then the vmalloc call is broken.
> > >
> > > What to do about this? Well, there are two things. Firstly, it would be
> > > really great to double check whether the GFP_NOFS is really needed. I
> > > cannot judge that because I am not familiar with the code. It would be
> > > great if the respective maintainers (hopefully get_maintainer.sh pointed
> > > me to all relevant ones). If there is not reclaim recursion issue then
> > > simply use the standard vmalloc (aka GFP_KERNEL request).
> > >
> > > If the use is really valid then we have a way to do the vmalloc
> > > allocation properly. We have memalloc_nofs_{save,restore} scope api. How
> > > does that work? You simply call memalloc_nofs_save when the reclaim
> > > recursion critical section starts (e.g. when you take a lock which is
> > > then used in the reclaim path - e.g. shrinker) and memalloc_nofs_restore
> > > when the critical section ends. _All_ allocations within that scope
> > > will get GFP_NOFS semantic automagically. If you are not sure about the
> > > scope itself then the easiest workaround is to wrap the vmalloc itself
> > > with a big fat comment that this should be revisited.
> > >
> > > Does that sound like something that can be done in a reasonable time?
> > > I have tried to bring this up in the past but our speed is glacial and
> > > there are attempts to do hacks like checking for abusers inside the
> > > vmalloc which is just too ugly to live.
> > >
> > > Please do not hesitate to get back to me if something is not clear.
> > >
> > > Thanks!
> > > --
> > > Michal Hocko
> > > SUSE Labs
> >
> > I made a patch that adds memalloc_noio/fs_save around these calls a year
> > ago: http://lkml.iu.edu/hypermail/linux/kernel/1707.0/01376.html
>
> Yeah, and that is the wrong approach.

It is crude, but it fixes the deadlock possibility. Then, the maintainers
will have a lot of time to refactor the code and move these
memalloc_noio_save calls to the proper scope.

> Let's try to fix this properly
> this time. As the above outlines, the worst case we can end up mid-term
> would be to wrap vmalloc calls with the scope api with a TODO. But I am
> pretty sure the respective maintainers can come up with a better
> solution. I am definitely willing to help here.
> --
> Michal Hocko
> SUSE Labs

Mikulas