Re: [RFC][PATCH] make global bitlock waitqueues per-node

From: Mel Gorman
Date: Tue Dec 20 2016 - 07:58:33 EST


On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote:
> On Mon, 19 Dec 2016 16:20:05 -0800
> Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
> > On 12/19/2016 03:07 PM, Linus Torvalds wrote:
> > > +wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > +{
> > > + const int __maybe_unused nid = page_to_nid(virt_to_page(word));
> > > +
> > > + return __bit_waitqueue(word, bit, nid);
> > >
> > > No can do. Part of the problem with the old coffee was that it did that
> > > virt_to_page() crud. That doesn't work with the virtually mapped stack.
> >
> > Ahhh, got it.
> >
> > So, what did you have in mind? Just redirect bit_waitqueue() to the
> > "first_online_node" waitqueues?
> >
> > wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > {
> > return __bit_waitqueue(word, bit, first_online_node);
> > }
> >
> > We could do some fancy stuff like only do virt_to_page() for things in
> > the linear map, but I'm not sure we'll see much of a gain for it. None
> > of the other waitqueue users look as pathological as the 'struct page'
> > ones. Maybe:
> >
> > wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > {
> > int nid
> > if (word >= VMALLOC_START) /* all addrs not in linear map */
> > nid = first_online_node;
> > else
> > nid = page_to_nid(virt_to_page(word));
> > return __bit_waitqueue(word, bit, nid);
> > }
>
> I think he meant just make the page_waitqueue do the per-node thing
> and leave bit_waitqueue as the global bit.
>

I'm pressed for time but at a glance, that might require a separate
structure of wait_queues for page waitqueue. Most users of bit_waitqueue
are not operating with pages. The first user is based on a word inside
a block_device for example. All non-page users could assume node-0. It
shrinks the available hash table space but as before, maybe collisions
are not common enough to actually matter. That would be worth checking
out. Alternatively, careful auditing to pick a node when it's known it's
safe to call virt_to_page may work but it would be fragile.

Unfortunately I won't be able to review or test any patches until January
3rd after I'm back online properly. Right now, I have intermittent internet
access at best. During the next 4 days, I know I definitely will not have
any internet access.

The last time around, there were three patch sets to avoid the overhead for
pages in particular. One was dropped (mine, based on Nick's old work) as
it was too complicated. Peter had some patches but after enough hammering
it failed due to a missed wakup that I didn't pin down before having to
travel to a conference.

I hadn't tested Nick's prototype although it looked fine because others
reviewed it before I looked and I was waiting for another version to
appear. If one appears, I'll take a closer look and bash it across a few
machines to see if it has any lost wakeup problems.

--
Mel Gorman
SUSE Labs