Re: [PATCH v2 19/20] mm, hugetlb: retry if failed to allocate andthere is concurrent user

From: Joonsoo Kim
Date: Mon Sep 30 2013 - 03:47:12 EST


On Mon, Sep 16, 2013 at 10:09:09PM +1000, David Gibson wrote:
> > >
> > > > + *do_dequeue = false;
> > > > spin_unlock(&hugetlb_lock);
> > > > page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
> > > > if (!page) {
> > >
> > > I think the counter also needs to be incremented in the case where we
> > > call alloc_buddy_huge_page() from alloc_huge_page(). Even though it's
> > > new, it gets added to the hugepage pool at this point and could still
> > > be a contended page for the last allocation, unless I'm missing
> > > something.
> >
> > Your comment has reasonable point to me, but I have a different opinion.
> >
> > As I already mentioned, the point is that we want to avoid the race
> > which kill the legitimate users of hugepages by out of resources.
> > I increase 'h->nr_dequeue_users' when the hugepage allocated by
> > administrator is dequeued. It is because what the hugepage I want to
> > protect from the race is the one allocated by administrator via
> > kernel param or /proc interface. Administrator may already know how many
> > hugepages are needed for their application so that he may set nr_hugepage
> > to reasonable value. I want to guarantee that these hugepages can be used
> > for his application without any race, since he assume that the application
> > would work fine with these hugepages.
> >
> > To protect hugepages returned from alloc_buddy_huge_page() from the race
> > is different for me. Although it will be added to the hugepage pool, this
> > doesn't guarantee certain application's success more. If certain
> > application's success depends on the race of this new hugepage, it's death
> > by the race doesn't matter, since nobody assume that it works fine.
>
> Hrm. I still think this path should be included. Although I'll agree
> that failing in this case is less bad.
>
> However, it can still lead to a situation where with two processes or
> threads, faulting on exactly the same shared page we have one succeed
> and the other fail. That's a strange behaviour and I think we want to
> avoid it in this case too.

Hello, David.

I don't think it is a strange behaviour. Similar situation can occur
even though we use the mutex. Hugepage allocation can be failed when
the first process try to allocate the hugepage while second process is blocked
by the mutex. And then, second process will go into the fault handler. And
at this time, it can succeed. So result is that we have one succeed and
the other fail.

It is slightly different from the case you mentioned, but I think that
effect for user is same. We cannot avoid this kind of race completely and
I think that avoiding the race for administrator managed hugepage pool is
good enough to use.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/