[PATCH v3 3/4] mm: swap: Simplify ssd behavior when scanner steals entry

From: Ryan Roberts
Date: Wed Oct 25 2023 - 10:46:13 EST


When a CPU fails to reserve a cluster (due to free list exhaustion), we
revert to the scanner to find a free entry somewhere in the swap file.
This might cause an entry to be stolen from another CPU's reserved
cluster. Upon noticing this, the CPU with the stolen entry would
previously scan forward to the end of the cluster trying to find a free
entry to use. If there were none, it would try to reserve a new pre-cpu
cluster and allocate from that.

This scanning behavior does not scale well to high-order allocations,
which will be introduced in a future patch since would need to scan for
a contiguous area that was naturally aligned. Given stealing is a rare
occurrence, let's remove the scanning behavior from the ssd allocator
and simply drop the cluster and try to allocate a new one. Given the
purpose of the per-cpu cluster is to ensure a given task's pages are
sequential on disk to aid readahead, allocating a new cluster at this
point makes most sense.

Furthermore, si->max will always be greater than or equal to the end of
the last cluster because any partial cluster will never be put on the
free cluster list. Therefore we can simplify this logic too.

These changes make it simpler to generalize
scan_swap_map_try_ssd_cluster() to handle any allocation order.

Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
---
mm/swapfile.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 617e34b8cdbe..94f7cc225eb9 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -639,27 +639,24 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,

/*
* Other CPUs can use our cluster if they can't find a free cluster,
- * check if there is still free entry in the cluster
+ * check if the expected entry is still free. If not, drop it and
+ * reserve a new cluster.
*/
- max = min_t(unsigned long, si->max,
- ALIGN_DOWN(tmp, SWAPFILE_CLUSTER) + SWAPFILE_CLUSTER);
- if (tmp < max) {
- ci = lock_cluster(si, tmp);
- while (tmp < max) {
- if (!si->swap_map[tmp])
- break;
- tmp++;
- }
+ ci = lock_cluster(si, tmp);
+ if (si->swap_map[tmp]) {
unlock_cluster(ci);
- }
- if (tmp >= max) {
*cpu_next = SWAP_NEXT_NULL;
goto new_cluster;
}
+ unlock_cluster(ci);
+
*offset = tmp;
*scan_base = tmp;
+
+ max = ALIGN_DOWN(tmp, SWAPFILE_CLUSTER) + SWAPFILE_CLUSTER;
tmp += 1;
*cpu_next = tmp < max ? tmp : SWAP_NEXT_NULL;
+
return true;
}

--
2.25.1