Re: [RFC PATCH 3/3] mm: support active anti-fragmentation algorithm

From: Vlastimil Babka
Date: Tue May 12 2015 - 05:02:13 EST

Next message: Pavel Machek: "Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)"
Previous message: Dave Young: "Re: [PATCH v11 09/10] iommu/vt-d: Copy functions for irte"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 04/28/2015 09:45 AM, Joonsoo Kim wrote:

On Mon, Apr 27, 2015 at 09:29:23AM +0100, Mel Gorman wrote:

On Mon, Apr 27, 2015 at 04:23:41PM +0900, Joonsoo Kim wrote:

We already have antifragmentation policy in page allocator. It works well
when system memory is sufficient, but, it doesn't works well when system
memory isn't sufficient because memory is already highly fragmented and
fallback/steal mechanism cannot get whole pageblock. If there is severe
unmovable allocation requestor like zram, problem could get worse.

CPU: 8
RAM: 512 MB with zram swap
WORKLOAD: kernel build with -j12
OPTION: page owner is enabled to measure fragmentation
After finishing the build, check fragmentation by 'cat /proc/pagetypeinfo'

* Before
Number of blocks type (movable)
DMA32: 207

Number of mixed blocks (movable)
DMA32: 111.2

Mixed blocks means that there is one or more allocated page for
unmovable/reclaimable allocation in movable pageblock. Results shows that
more than half of movable pageblock is tainted by other migratetype
allocation.

To mitigate this fragmentation, this patch implements active
anti-fragmentation algorithm. Idea is really simple. When some
unmovable/reclaimable steal happens from movable pageblock, we try to
migrate out other pages that can be migratable in this pageblock are and
use these generated freepage for further allocation request of
corresponding migratetype.

Once unmovable allocation taints movable pageblock, it cannot easily
recover. Instead of praying that it gets restored, making it unmovable
pageblock as much as possible and using it further unmovable request
would be more reasonable approach.

Below is result of this idea.

* After
Number of blocks type (movable)
DMA32: 208.2

Number of mixed blocks (movable)
DMA32: 55.8

Result shows that non-mixed block increase by 59% in this case.

Interesting. I tested a patch prototype like this too (although the work wasn't offloaded to a kthread, I wanted to see benefits first) and it yielded no significant difference. But admittedly I was using stress-highalloc for huge page sized allocations and a 4GB memory system...

So with these results it seems definitely worth pursuing, taking Mel's comments into account. We should think about coordination with khugepaged, which is another source of compaction. See my patchset from yesterday "Outsourcing page fault THP allocations to khugepaged" (sorry I didn't CC you). I think ideally this "antifrag" or maybe "kcompactd" thread would be one per NUMA node and serve both for the pageblock antifragmentation requests (with higher priority) and then THP allocation requests. Then khugepaged would do just the scanning for collapses, which might be later moved to task_work context and khugepaged killed. We could also remove compaction from kswapd balancing and let it wake up kcompactd instead.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Pavel Machek: "Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)"
Previous message: Dave Young: "Re: [PATCH v11 09/10] iommu/vt-d: Copy functions for irte"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]