Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

From: Muchun Song
Date: Mon Feb 15 2021 - 23:36:18 EST


On Tue, Feb 16, 2021 at 3:39 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 16-02-21 02:19:20, Muchun Song wrote:
> > On Tue, Feb 16, 2021 at 1:48 AM Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Feb 16, 2021 at 12:28 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > > >
> > > > On Mon 15-02-21 23:36:49, Muchun Song wrote:
> > > > [...]
> > > > > > There shouldn't be any real reason why the memory allocation for
> > > > > > vmemmaps, or handling vmemmap in general, has to be done from within the
> > > > > > hugetlb lock and therefore requiring a non-sleeping semantic. All that
> > > > > > can be deferred to a more relaxed context. If you want to make a
> > > > >
> > > > > Yeah, you are right. We can put the freeing hugetlb routine to a
> > > > > workqueue. Just like I do in the previous version (before v13) patch.
> > > > > I will pick up these patches.
> > > >
> > > > I haven't seen your v13 and I will unlikely have time to revisit that
> > > > version. I just wanted to point out that the actual allocation doesn't
> > > > have to happen from under the spinlock. There are multiple ways to go
> > > > around that. Dropping the lock would be one of them. Preallocation
> > > > before the spin lock is taken is another. WQ is certainly an option but
> > > > I would take it as the last resort when other paths are not feasible.
> > > >
> > >
> > > "Dropping the lock" and "Preallocation before the spin lock" can limit
> > > the context of put_page to non-atomic context. I am not sure if there
> > > is a page puted somewhere under an atomic context. e.g. compaction.
> > > I am not an expert on this.
> >
> > Using GFP_KERNEL will also use the current task cpuset to allocate
> > memory. Do we have an interface to ignore current task cpuset?If not,
> > WQ may be the only option and it also will not limit the context of
> > put_page. Right?
>
> Well, GFP_KERNEL is constrained to the task cpuset only if the said
> cpuset is hardwalled IIRC. But I do not see why this is a problem.

I mean that if there are more than one node in the system,
but the current task cpuset only allows one node. If current
node has no memory and other nodes have enough memory.
We can fail to allocate vmemmap pages. But actually it is
suitable to allocate vmemmap pages from other nodes.
Right?

Thanks.

> --
> Michal Hocko
> SUSE Labs