Re: [PATCH] mm: Do not stall in synchronous compaction for THPallocations
From: Andrea Arcangeli
Date: Wed Nov 16 2011 - 08:31:30 EST
On Wed, Nov 16, 2011 at 05:13:50AM +0100, Andrea Arcangeli wrote:
> After checking my current thp vmstat I think Andrew was right and we
> backed out for a good reason before. I'm getting significantly worse
> success rate, not sure why it was a small reduction in success rate
> but hey I cannot exclude I may have broke something with some other
> patch. I've been running it together with a couple more changes. If
> it's this change that reduced the success rate, I'm afraid going
> always async is not ok.
I wonder if the high failure rate when shutting off "sync compaction"
and forcing only "async compaction" for THP (your patch queued in -mm)
is also because of ISOLATE_CLEAN being set in compaction from commit
39deaf8. ISOLATE_CLEAN skipping PageDirty means all tmpfs/anon pages
added to swapcache (or removed from swapcache which sets the dirty bit
on the page because the pte may be mapped clean) are skipped entirely
by async compaction for no good reason. That can't possibly be ok,
because those don't actually require any I/O or blocking to be
migrated. PageDirty is a "blocking/IO" operation only for filebacked
pages. So I think we must revert 39deaf8, instead of cleaning it up
with my cleanup posted in Message-Id 20111115020831.GF4414@xxxxxxxxxx .
ISOLATED_CLEAN still looks right for may_writepage, for reclaim dirty
bit set on the page is a I/O event, for migrate it's not if it's
tmpfs/anon.
Did you run your compaction tests with some swap activity?
Reducing the async compaction effectiveness while there's some swap
activity then also leads in more frequently than needed running sync
compaction and page reclaim.
I'm hopeful however that by running just 2 passes of migrate_pages
main loop with the "avoid overwork in migrate sync mode" patch, we can
fix the excessive hanging. If that works number of passes could
actually be a tunable, and setting it to 1 (instead of 2) would then
provide 100% "async compaction" behavior again. And if somebody
prefers to stick to 10 he can... so then he can do trylock pass 0,
lock_page pass1, wait_writeback pass2, wait pin pass3, finally migrate
pass4. (something 2 passes alone won't allow). So making the migrate
passes/force-threshold tunable (maybe only for the new sync=2
migration mode) could be good idea. Or we could just return to sync
true/false and have the migration tunable affect everything but that
would alter the reliability of sys_move_pages and other numa things
too, where I guess 10 passes are ok. This is why I added a sync=2 mode
for migrate.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/