RE: [PATCH 1/2] sched/wait: Break up long wake list walk

From: Liang, Kan
Date: Fri Aug 18 2017 - 09:06:29 EST




> On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan <kan.liang@xxxxxxxxx> wrote:
> >
> > Here is the call stack of wait_on_page_bit_common when the queue is
> > long (entries >1000).
> >
> > # Overhead Trace output
> > # ........ ..................
> > #
> > 100.00% (ffffffff931aefca)
> > |
> > ---wait_on_page_bit
> > __migration_entry_wait
> > migration_entry_wait
> > do_swap_page
> > __handle_mm_fault
> > handle_mm_fault
> > __do_page_fault
> > do_page_fault
> > page_fault
>
> Hmm. Ok, so it does seem to very much be related to migration. Your
> wake_up_page_bit() profile made me suspect that, but this one seems to
> pretty much confirm it.
>
> So it looks like that wait_on_page_locked() thing in __migration_entry_wait(),
> and what probably happens is that your load ends up triggering a lot of
> migration (or just migration of a very hot page), and then *every* thread
> ends up waiting for whatever page that ended up getting migrated.
>
> And so the wait queue for that page grows hugely long.
>
> Looking at the other profile, the thing that is locking the page (that everybody
> then ends up waiting on) would seem to be
> migrate_misplaced_transhuge_page(), so this is _presumably_ due to NUMA
> balancing.
>
> Does the problem go away if you disable the NUMA balancing code?
>

Yes, the problem goes away when NUMA balancing is disabled.


Thanks,
Kan