Re: [PATCH v2 19/20] mm, hugetlb: retry if failed to allocate andthere is concurrent user

From: Davidlohr Bueso
Date: Mon Dec 09 2013 - 11:36:38 EST


On Mon, 2013-09-30 at 16:47 +0900, Joonsoo Kim wrote:
> On Mon, Sep 16, 2013 at 10:09:09PM +1000, David Gibson wrote:
> > > >
> > > > > + *do_dequeue = false;
> > > > > spin_unlock(&hugetlb_lock);
> > > > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
> > > > > if (!page) {
> > > >
> > > > I think the counter also needs to be incremented in the case where we
> > > > call alloc_buddy_huge_page() from alloc_huge_page(). Even though it's
> > > > new, it gets added to the hugepage pool at this point and could still
> > > > be a contended page for the last allocation, unless I'm missing
> > > > something.
> > >
> > > Your comment has reasonable point to me, but I have a different opinion.
> > >
> > > As I already mentioned, the point is that we want to avoid the race
> > > which kill the legitimate users of hugepages by out of resources.
> > > I increase 'h->nr_dequeue_users' when the hugepage allocated by
> > > administrator is dequeued. It is because what the hugepage I want to
> > > protect from the race is the one allocated by administrator via
> > > kernel param or /proc interface. Administrator may already know how many
> > > hugepages are needed for their application so that he may set nr_hugepage
> > > to reasonable value. I want to guarantee that these hugepages can be used
> > > for his application without any race, since he assume that the application
> > > would work fine with these hugepages.
> > >
> > > To protect hugepages returned from alloc_buddy_huge_page() from the race
> > > is different for me. Although it will be added to the hugepage pool, this
> > > doesn't guarantee certain application's success more. If certain
> > > application's success depends on the race of this new hugepage, it's death
> > > by the race doesn't matter, since nobody assume that it works fine.
> >
> > Hrm. I still think this path should be included. Although I'll agree
> > that failing in this case is less bad.
> >
> > However, it can still lead to a situation where with two processes or
> > threads, faulting on exactly the same shared page we have one succeed
> > and the other fail. That's a strange behaviour and I think we want to
> > avoid it in this case too.
>
> Hello, David.
>
> I don't think it is a strange behaviour. Similar situation can occur
> even though we use the mutex. Hugepage allocation can be failed when
> the first process try to allocate the hugepage while second process is blocked
> by the mutex. And then, second process will go into the fault handler. And
> at this time, it can succeed. So result is that we have one succeed and
> the other fail.
>
> It is slightly different from the case you mentioned, but I think that
> effect for user is same. We cannot avoid this kind of race completely and
> I think that avoiding the race for administrator managed hugepage pool is
> good enough to use.

What was the final decision on this issue? Is Joonsoo's approach to
removing this mutex viable, or are we stuck with it?

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/