Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs

From: Vlastimil Babka
Date: Thu Aug 07 2014 - 09:04:59 EST

Next message: Hannes Reinecke: "Re: [PATCH] scsi: Check if WSR already defined"
Previous message: caesar: "Re: [PATCH v4 2/2] pwm: rockchip: Added to support for RK3288 SoC"
In reply to: Joonsoo Kim: "Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs"
Next in thread: Joonsoo Kim: "Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 08/07/2014 02:26 PM, Joonsoo Kim wrote:

2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@xxxxxxx>:

Ah, right. I thought that everything going to pcp lists would be through
freeing which would already observe the isolate migratetype and skip
pcplist. I forgot about the direct filling of pcplists from buddy list.
You're right that we don't want extra hooks there.

Still, couldn't this be solved in a simpler way via another pcplist drain
after the pages are moved from normal to isolate buddy list? Should be even
faster because instead of disable - drain - enable (5 all-cpu kicks, since
each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
it's true that pageset_update is single-zone operation, I guess we would
easily benefit from having a single-zone drain operation as well.

I hope so, but, it's not possible. Consider following situation.

Page A: on pcplist of CPU2 and it is on isolate pageblock.

CPU 1 CPU 2
drain pcplist
wait IPI finished move A to normal buddy list
finish IPI
A is moved to pcplist by allocation request

move doesn't catch A,
because it is on pcplist.

drain pcplist
wait IPI finished move A to normal buddy list
finish IPI
A is moved to pcplist by allocation request

repeat!!

It could happen infinitely, though, low possibility.

Hm I see. Not a correctness issue, but still a failure to isolate. Probably not impossible with enough CPU's and considering the fact that after pcplists are drained, the next allocation request will try to refill them. And during the drain, the pages are added to the beginning of the free_list AFAICS, so they will be in the first refill batch.

OK, another attempt for alternative solution proposal :) It's not that I would think disabling pcp would be so bad, just want to be sure there is no better alternative.

What if the drain operation had a flag telling it to recheck pageblock migratetype and don't assume it's on the correct pcplist. Then the problem would go away I think? Would it be possible to do without affecting the normal drain-pcplist-when-full path? So that the cost is only applied to isolation, but lower cost than pcplist disabling.

Actually I look that free_pcppages_bulk() doesn't consider migratetype of the pcplist, but uses get_freepage_migratetype(page). So the pcplist drain could first scan the pcplists and rewrite the freepage_migratetype according to pageblock_migratetype. Then the free_pcppages_bulk() operation would be unchanged for normal operation.

Or is this too clumsy? We could be also smart and have an alternative to free_pcppages_bulk() which would omit the round-robin stuff (not needed for this kind of drain), and have a pfn range to limit its operation to pages that we are isolating.

Hm I guess with this approach some pages might still escape us if they were moving between normal buddy list and pcplist through rmqueue_bulk() and free_pcppages_bulk() (and not through our drain) at the wrong moments, but I guess that would require a really specific workload (alternating between burst of allocations and deallocations) and consistently unlucky timing.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Hannes Reinecke: "Re: [PATCH] scsi: Check if WSR already defined"
Previous message: caesar: "Re: [PATCH v4 2/2] pwm: rockchip: Added to support for RK3288 SoC"
In reply to: Joonsoo Kim: "Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs"
Next in thread: Joonsoo Kim: "Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]