Re: [PATCH] mm: Fix comment for NODEMASK_ALLOC

From: Oscar Salvador
Date: Thu Aug 23 2018 - 06:51:37 EST


On Tue, Aug 21, 2018 at 01:51:59PM -0700, Andrew Morton wrote:
> On Tue, 21 Aug 2018 14:30:24 +0200 Oscar Salvador <osalvador@xxxxxxxxxxxxxxxxxx> wrote:
>
> > On Tue, Aug 21, 2018 at 02:17:34PM +0200, Michal Hocko wrote:
> > > We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some
> > > time (around SLE11-SP3 AFAICS).
> > >
> > > Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do
> > > larger than 1024 NUMA nodes? This would be 128B and from a quick glance
> > > it seems that none of those functions are called in deep stacks. I
> > > haven't gone through all of them but a patch which checks them all and
> > > removes NODES_ALLOC would be quite nice IMHO.
> >
> > No, maximum we can get is 1024 NUMA nodes.
> > I checked this when writing another patch [1], and since having gone
> > through all archs Kconfigs, CONFIG_NODES_SHIFT=10 is the limit.
> >
> > NODEMASK_ALLOC gets only called from:
> >
> > - unregister_mem_sect_under_nodes() (not anymore after [1])
> > - __nr_hugepages_store_common (This does not seem to have a deep stack, we could use a normal nodemask_t)
> >
> > But is also used for NODEMASK_SCRATCH (mainly used for mempolicy):
> >
> > struct nodemask_scratch {
> > nodemask_t mask1;
> > nodemask_t mask2;
> > };
> >
> > that would make 256 bytes in case CONFIG_NODES_SHIFT=10.
>
> And that sole site could use an open-coded kmalloc.

It is not really one single place, but four:

- do_set_mempolicy()
- do_mbind()
- kernel_migrate_pages()
- mpol_shared_policy_init()

They get called in:

- do_set_mempolicy()
- From set_mempolicy syscall
- From numa_policy_init()
- From numa_default_policy()

* All above do not look like they have a deep stack, so it should
be possible to get rid of NODEMASK_SCRATCH there.

- do_mbind
- From mbind syscall

* Should be feasible here as well.

- kernel_migrate_pages()

- From migrate_pages syscall

* Again, this should be doable.

- mpol_shared_policy_init()

- From hugetlbfs_alloc_inode()
- shmem_get_inode()

* Seems doable for hugetlbfs_alloc_inode as well.
I only got to check hugetlbfs_alloc_inode, because shmem_get_inode


So it seems that this can be done in most of the places.
The only tricky function might be mpol_shared_policy_init because of shmem_get_inode.
But in that case, we could use an open-coded kmalloc there.

Thanks
--
Oscar Salvador
SUSE L3