Re: [RFC] mm: Don't use radix tree writeback tags for pages in swap cache

From: Huang\, Ying
Date: Tue Aug 09 2016 - 13:00:32 EST


Hi, Dave,

Dave Hansen <dave.hansen@xxxxxxxxx> writes:

> On 08/09/2016 09:17 AM, Huang, Ying wrote:
>> File pages uses a set of radix tags (DIRTY, TOWRITE, WRITEBACK) to
>> accelerate finding the pages with the specific tag in the the radix tree
>> during writing back an inode. But for anonymous pages in swap cache,
>> there are no inode based writeback. So there is no need to find the
>> pages with some writeback tags in the radix tree. It is no necessary to
>> touch radix tree writeback tags for pages in swap cache.
>
> Seems simple enough. Do we do any of this unnecessary work for the
> other radix tree tags? If so, maybe we should just fix this once and
> for all. Could we, for instance, WARN_ONCE() in radix_tree_tag_set() if
> it sees a swap mapping get handed in there?

Good idea! I will do that and try to catch other places if any.

> In any case, I think the new !PageSwapCache(page) check either needs
> commenting, or a common helper for the two sites that you can comment.

Sure. I will add that.

>> With this patch, the swap out bandwidth improved 22.3% in vm-scalability
>> swap-w-seq test case with 8 processes on a Xeon E5 v3 system, because of
>> reduced contention on swap cache radix tree lock. To test sequence swap
>> out, the test case uses 8 processes sequentially allocate and write to
>> anonymous pages until RAM and part of the swap device is used up.
>
> What was the swap device here, btw? What is the actual bandwidth
> increase you are seeing? Is it 1MB/s -> 1.223MB/s? :)

The swap device here is a DRAM simulated persistent memory block device
(pmem).

1207402 Â 7% +22.3% 1476578 Â 6% vmstat.swap.so

The actual bandwidth increase is from 1.21GB/s -> 1.48 GB/s. This is
lower than that of NVMe disk, so the bottleneck is in swap subsystem
instead of block subsystem and device.

Best Regards,
Huang, Ying