Re: [RFC PATCH] mm: introduce kv[mz]alloc helpers

From: Michal Hocko
Date: Fri Dec 09 2016 - 01:22:34 EST

On Fri 09-12-16 02:00:17, Al Viro wrote:
> On Fri, Dec 09, 2016 at 12:44:17PM +1100, Dave Chinner wrote:
> > On Thu, Dec 08, 2016 at 11:33:00AM +0100, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@xxxxxxxx>
> > >
> > > Using kmalloc with the vmalloc fallback for larger allocations is a
> > > common pattern in the kernel code. Yet we do not have any common helper
> > > for that and so users have invented their own helpers. Some of them are
> > > really creative when doing so. Let's just add kv[mz]alloc and make sure
> > > it is implemented properly. This implementation makes sure to not make
> > > a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
> > > to not warn about allocation failures. This also rules out the OOM
> > > killer as the vmalloc is a more approapriate fallback than a disruptive
> > > user visible action.
> > >
> > > This patch also changes some existing users and removes helpers which
> > > are specific for them. In some cases this is not possible (e.g.
> > > ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
> > > broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
> > > in general (note that the page table allocation is GFP_KERNEL). Those
> > > need to be fixed separately.
> >
> > See fs/xfs/kmem.c::kmem_zalloc_large(), which is XFS's version of
> > kvmalloc() that is GFP_NOFS/GFP_NOIO safe. Any generic API for this
> > functionality will have to play these memalloc_noio_save/
> > memalloc_noio_restore games to ensure they are GFP_NOFS safe....
> Easier to handle those in vmalloc() itself.

I think there were some attempts in the past but some of the code paths
are burried too deep and adding gfp_mask all the way down there seemed
like a major surgery.

> The problem I have with these
> helpers is that different places have different cutoff thresholds for
> switch from kmalloc to vmalloc; has anyone done an analysis of those?

Yes, I have noticed some creativity as well. Some of them didn't bother
to kmalloc at all for size > PAGE_SIZE. Some where playing tricks with
PAGE_ALLOC_COSTLY_ORDER. I believe the right thing to do is to simply do
not hammer the system with size > PAGE_SZE which means __GFP_NORETRY for
them and fallback to vmalloc on the failure (basically what
seq_buf_alloc did). I cannot offer any numbers but at least
seq_buf_alloc has proven to do the right thing over time.

Michal Hocko