[PATCH v4] mm: swap: properly update readahead statistics in unuse_pte_range()

From: Andrea Righi
Date: Sat Apr 18 2020 - 04:47:16 EST


In unuse_pte_range() we blindly swap-in pages without checking if the
swap entry is already present in the swap cache.

By doing this, the hit/miss ratio used by the swap readahead heuristic
is not properly updated and this leads to non-optimal performance during
swapoff.

Tracing the distribution of the readahead size returned by the swap
readahead heuristic during swapoff shows that a small readahead size is
used most of the time as if we had only misses (this happens both with
cluster and vma readahead), for example:

r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
COUNT EVENT
36948 $retval = 8
44151 $retval = 4
49290 $retval = 1
527771 $retval = 2

Checking if the swap entry is present in the swap cache, instead, allows
to properly update the readahead statistics and the heuristic behaves in
a better way during swapoff, selecting a bigger readahead size:

r::swapin_nr_pages(unsigned long offset):unsigned long:$retval
COUNT EVENT
1618 $retval = 1
4960 $retval = 2
41315 $retval = 4
103521 $retval = 8

In terms of swapoff performance the result is the following:

Testing environment
===================

- Host:
CPU: 1.8GHz Intel Core i7-8565U (quad-core, 8MB cache)
HDD: PC401 NVMe SK hynix 512GB
MEM: 16GB

- Guest (kvm):
8GB of RAM
virtio block driver
16GB swap file on ext4 (/swapfile)

Test case
=========
- allocate 85% of memory
- `systemctl hibernate` to force all the pages to be swapped-out to the
swap file
- resume the system
- measure the time that swapoff takes to complete:
# /usr/bin/time swapoff /swapfile

Result (swapoff time)
======
5.6 vanilla 5.6 w/ this patch
----------- -----------------
cluster-readahead 22.09s 12.19s
vma-readahead 18.20s 15.33s

Conclusion
==========

The specific use case this patch is addressing is to improve swapoff
performance in cloud environments when a VM has been hibernated, resumed
and all the memory needs to be forced back to RAM by disabling swap.

This change allows to better exploits the advantages of the readahead
heuristic during swapoff and this improvement allows to to speed up the
resume process of such VMs.

Reviewed-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Signed-off-by: Andrea Righi <andrea.righi@xxxxxxxxxxxxx>
---
Changes in v4:
- added a "conclusion" section in the patch description to better
explain the importance of this change (no functional changes)

mm/swapfile.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 5871a2aa86a5..f8bf926c9c8f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1937,10 +1937,14 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,

pte_unmap(pte);
swap_map = &si->swap_map[offset];
- vmf.vma = vma;
- vmf.address = addr;
- vmf.pmd = pmd;
- page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, &vmf);
+ page = lookup_swap_cache(entry, vma, addr);
+ if (!page) {
+ vmf.vma = vma;
+ vmf.address = addr;
+ vmf.pmd = pmd;
+ page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
+ &vmf);
+ }
if (!page) {
if (*swap_map == 0 || *swap_map == SWAP_MAP_BAD)
goto try_next;
--
2.25.1