Re: [PATCH v5 3/9] mm/swap: Split swap cache into 64MB trunks
From: Andrew Morton
Date: Wed Jan 11 2017 - 18:09:51 EST
On Wed, 11 Jan 2017 09:55:13 -0800 Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> wrote:
> The patch is to improve the scalability of the swap out/in via using
> fine grained locks for the swap cache. In current kernel, one address
> space will be used for each swap device. And in the common
> configuration, the number of the swap device is very small (one is
> typical). This causes the heavy lock contention on the radix tree of
> the address space if multiple tasks swap out/in concurrently. But in
> fact, there is no dependency between pages in the swap cache. So that,
> we can split the one shared address space for each swap device into
> several address spaces to reduce the lock contention. In the patch, the
> shared address space is split into 64MB trunks. 64MB is chosen to
> balance the memory space usage and effect of lock contention reduction.
>
> The size of struct address_space on x86_64 architecture is 408B, so with
> the patch, 6528B more memory will be used for every 1GB swap space on
> x86_64 architecture.
>
> One address space is still shared for the swap entries in the same 64M
> trunks. To avoid lock contention for the first round of swap space
> allocation, the order of the swap clusters in the initial free clusters
> list is changed. The swap space distance between the consecutive swap
> clusters in the free cluster list is at least 64M. After the first
> round of allocation, the swap clusters are expected to be freed
> randomly, so the lock contention should be reduced effectively.
Switching from a single radix-tree to an array of radix-trees to reduce
contention seems a bit hacky. That we can do this and have everything
continue to work tells me that we're simply using an inappropriate data
structure to hold this info.