Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
From: Vlastimil Babka
Date: Thu Aug 07 2014 - 09:04:59 EST
On 08/07/2014 02:26 PM, Joonsoo Kim wrote:
2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@xxxxxxx>:
Ah, right. I thought that everything going to pcp lists would be through
freeing which would already observe the isolate migratetype and skip
pcplist. I forgot about the direct filling of pcplists from buddy list.
You're right that we don't want extra hooks there.
Still, couldn't this be solved in a simpler way via another pcplist drain
after the pages are moved from normal to isolate buddy list? Should be even
faster because instead of disable - drain - enable (5 all-cpu kicks, since
each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
it's true that pageset_update is single-zone operation, I guess we would
easily benefit from having a single-zone drain operation as well.
I hope so, but, it's not possible. Consider following situation.
Page A: on pcplist of CPU2 and it is on isolate pageblock.
CPU 1 CPU 2
drain pcplist
wait IPI finished move A to normal buddy list
finish IPI
A is moved to pcplist by allocation request
move doesn't catch A,
because it is on pcplist.
drain pcplist
wait IPI finished move A to normal buddy list
finish IPI
A is moved to pcplist by allocation request
repeat!!
It could happen infinitely, though, low possibility.
Hm I see. Not a correctness issue, but still a failure to isolate.
Probably not impossible with enough CPU's and considering the fact that
after pcplists are drained, the next allocation request will try to
refill them. And during the drain, the pages are added to the beginning
of the free_list AFAICS, so they will be in the first refill batch.
OK, another attempt for alternative solution proposal :) It's not that I
would think disabling pcp would be so bad, just want to be sure there is
no better alternative.
What if the drain operation had a flag telling it to recheck pageblock
migratetype and don't assume it's on the correct pcplist. Then the
problem would go away I think? Would it be possible to do without
affecting the normal drain-pcplist-when-full path? So that the cost is
only applied to isolation, but lower cost than pcplist disabling.
Actually I look that free_pcppages_bulk() doesn't consider migratetype
of the pcplist, but uses get_freepage_migratetype(page). So the pcplist
drain could first scan the pcplists and rewrite the freepage_migratetype
according to pageblock_migratetype. Then the free_pcppages_bulk()
operation would be unchanged for normal operation.
Or is this too clumsy? We could be also smart and have an alternative to
free_pcppages_bulk() which would omit the round-robin stuff (not needed
for this kind of drain), and have a pfn range to limit its operation to
pages that we are isolating.
Hm I guess with this approach some pages might still escape us if they
were moving between normal buddy list and pcplist through rmqueue_bulk()
and free_pcppages_bulk() (and not through our drain) at the wrong
moments, but I guess that would require a really specific workload
(alternating between burst of allocations and deallocations) and
consistently unlucky timing.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/