Re: [PATCH -mm -v3] mm, swap: Sort swap entries before free

From: Minchan Kim
Date: Thu Apr 27 2017 - 00:44:41 EST


On Wed, Apr 26, 2017 at 08:42:10PM +0800, Huang, Ying wrote:
> Minchan Kim <minchan@xxxxxxxxxx> writes:
>
> > On Fri, Apr 21, 2017 at 08:29:30PM +0800, Huang, Ying wrote:
> >> "Huang, Ying" <ying.huang@xxxxxxxxx> writes:
> >>
> >> > Minchan Kim <minchan@xxxxxxxxxx> writes:
> >> >
> >> >> On Wed, Apr 19, 2017 at 04:14:43PM +0800, Huang, Ying wrote:
> >> >>> Minchan Kim <minchan@xxxxxxxxxx> writes:
> >> >>>
> >> >>> > Hi Huang,
> >> >>> >
> >> >>> > On Fri, Apr 07, 2017 at 02:49:01PM +0800, Huang, Ying wrote:
> >> >>> >> From: Huang Ying <ying.huang@xxxxxxxxx>
> >> >>> >>
> >> >>> >> void swapcache_free_entries(swp_entry_t *entries, int n)
> >> >>> >> {
> >> >>> >> struct swap_info_struct *p, *prev;
> >> >>> >> @@ -1075,6 +1083,10 @@ void swapcache_free_entries(swp_entry_t *entries, int n)
> >> >>> >>
> >> >>> >> prev = NULL;
> >> >>> >> p = NULL;
> >> >>> >> +
> >> >>> >> + /* Sort swap entries by swap device, so each lock is only taken once. */
> >> >>> >> + if (nr_swapfiles > 1)
> >> >>> >> + sort(entries, n, sizeof(entries[0]), swp_entry_cmp, NULL);
> >> >>> >
> >> >>> > Let's think on other cases.
> >> >>> >
> >> >>> > There are two swaps and they are configured by priority so a swap's usage
> >> >>> > would be zero unless other swap used up. In case of that, this sorting
> >> >>> > is pointless.
> >> >>> >
> >> >>> > As well, nr_swapfiles is never decreased so if we enable multiple
> >> >>> > swaps and then disable until a swap is remained, this sorting is
> >> >>> > pointelss, too.
> >> >>> >
> >> >>> > How about lazy sorting approach? IOW, if we found prev != p and,
> >> >>> > then we can sort it.
> >> >>>
> >> >>> Yes. That should be better. I just don't know whether the added
> >> >>> complexity is necessary, given the array is short and sort is fast.
> >> >>
> >> >> Huh?
> >> >>
> >> >> 1. swapon /dev/XXX1
> >> >> 2. swapon /dev/XXX2
> >> >> 3. swapoff /dev/XXX2
> >> >> 4. use only one swap
> >> >> 5. then, always pointless sort.
> >> >
> >> > Yes. In this situation we will do unnecessary sorting. What I don't
> >> > know is whether the unnecessary sorting will hurt performance in real
> >> > life. I can do some measurement.
> >>
> >> I tested the patch with 1 swap device and 1 process to eat memory
> >> (remove the "if (nr_swapfiles > 1)" for test). I think this is the
> >> worse case because there is no lock contention. The memory freeing time
> >> increased from 1.94s to 2.12s (increase ~9.2%). So there is some
> >> overhead for some cases. I change the algorithm to something like
> >> below,
> >>
> >> void swapcache_free_entries(swp_entry_t *entries, int n)
> >> {
> >> struct swap_info_struct *p, *prev;
> >> int i;
> >> + swp_entry_t entry;
> >> + unsigned int prev_swp_type;
> >>
> >> if (n <= 0)
> >> return;
> >>
> >> + prev_swp_type = swp_type(entries[0]);
> >> + for (i = n - 1; i > 0; i--) {
> >> + if (swp_type(entries[i]) != prev_swp_type)
> >> + break;
> >> + }
> >
> > That's really what I want to avoid. For many swap usecases,
> > it adds unnecessary overhead.
> >
> >> +
> >> + /* Sort swap entries by swap device, so each lock is only taken once. */
> >> + if (i)
> >> + sort(entries, n, sizeof(entries[0]), swp_entry_cmp, NULL);
> >> prev = NULL;
> >> p = NULL;
> >> for (i = 0; i < n; ++i) {
> >> - p = swap_info_get_cont(entries[i], prev);
> >> + entry = entries[i];
> >> + p = swap_info_get_cont(entry, prev);
> >> if (p)
> >> - swap_entry_free(p, entries[i]);
> >> + swap_entry_free(p, entry);
> >> prev = p;
> >> }
> >> if (p)
> >>
> >> With this patch, the memory freeing time increased from 1.94s to 1.97s.
> >> I think this is good enough. Do you think so?
> >
> > What I mean is as follows(I didn't test it at all):
> >
> > With this, sort entries if we found multiple entries in current
> > entries. It adds some condition checks for non-multiple swap
> > usecase but it would be more cheaper than the sorting.
> > And it adds a [un]lock overhead for multiple swap usecase but
> > it should be a compromise for single-swap usecase which is more
> > popular.
> >
>
> How about the following solution? It can avoid [un]lock overhead and
> double lock issue for multiple swap user case and has good performance
> for one swap user case too.

How worse with approach I suggested compared to as-is?
Unless it's too bad, let's not add more complicated thing to just
enhance the minor usecase in such even *slow* path.
It adds code size/maintainance overead.
With your suggestion, it might enhance a bit with speicific benchmark
but not sure it's really worth for real practice.

>
> Best Regards,
> Huang, Ying
>
> From 7bd903c42749c448ef6acbbdee8dcbc1c5b498b9 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@xxxxxxxxx>
> Date: Thu, 23 Feb 2017 13:05:20 +0800
> Subject: [PATCH -v5] mm, swap: Sort swap entries before free
>
> To reduce the lock contention of swap_info_struct->lock when freeing
> swap entry. The freed swap entries will be collected in a per-CPU
> buffer firstly, and be really freed later in batch. During the batch
> freeing, if the consecutive swap entries in the per-CPU buffer belongs
> to same swap device, the swap_info_struct->lock needs to be
> acquired/released only once, so that the lock contention could be
> reduced greatly. But if there are multiple swap devices, it is
> possible that the lock may be unnecessarily released/acquired because
> the swap entries belong to the same swap device are non-consecutive in
> the per-CPU buffer.
>
> To solve the issue, the per-CPU buffer is sorted according to the swap
> device before freeing the swap entries. Test shows that the time
> spent by swapcache_free_entries() could be reduced after the patch.
>
> With the patch, the memory (some swapped out) free time reduced
> 13.6% (from 2.59s to 2.28s) in the vm-scalability swap-w-rand test
> case with 16 processes. The test is done on a Xeon E5 v3 system. The
> swap device used is a RAM simulated PMEM (persistent memory) device.
> To test swapping, the test case creates 16 processes, which allocate
> and write to the anonymous pages until the RAM and part of the swap
> device is used up, finally the memory (some swapped out) is freed
> before exit.
>
> Signed-off-by: Huang Ying <ying.huang@xxxxxxxxx>
> Acked-by: Tim Chen <tim.c.chen@xxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Shaohua Li <shli@xxxxxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: Rik van Riel <riel@xxxxxxxxxx>
>
> v5:
>
> - Use a smarter way to determine whether sort is necessary.
>
> v4:
>
> - Avoid unnecessary sort if all entries are from one swap device.
>
> v3:
>
> - Add some comments in code per Rik's suggestion.
>
> v2:
>
> - Avoid sort swap entries if there is only one swap device.
> ---
> mm/swapfile.c | 43 ++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 38 insertions(+), 5 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 71890061f653..10e75f9e8ac1 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -37,6 +37,7 @@
> #include <linux/swapfile.h>
> #include <linux/export.h>
> #include <linux/swap_slots.h>
> +#include <linux/sort.h>
>
> #include <asm/pgtable.h>
> #include <asm/tlbflush.h>
> @@ -1065,20 +1066,52 @@ void swapcache_free(swp_entry_t entry)
> }
> }
>
> +static int swp_entry_cmp(const void *ent1, const void *ent2)
> +{
> + const swp_entry_t *e1 = ent1, *e2 = ent2;
> +
> + return (int)(swp_type(*e1) - swp_type(*e2));
> +}
> +
> void swapcache_free_entries(swp_entry_t *entries, int n)
> {
> struct swap_info_struct *p, *prev;
> - int i;
> + int i, m;
> + swp_entry_t entry;
> + unsigned int prev_swp_type;
>
> if (n <= 0)
> return;
>
> prev = NULL;
> p = NULL;
> - for (i = 0; i < n; ++i) {
> - p = swap_info_get_cont(entries[i], prev);
> - if (p)
> - swap_entry_free(p, entries[i]);
> + m = 0;
> + prev_swp_type = swp_type(entries[0]);
> + for (i = 0; i < n; i++) {
> + entry = entries[i];
> + if (likely(swp_type(entry) == prev_swp_type)) {
> + p = swap_info_get_cont(entry, prev);
> + if (likely(p))
> + swap_entry_free(p, entry);
> + prev = p;
> + } else if (!m)
> + m = i;
> + }
> + if (p)
> + spin_unlock(&p->lock);
> + if (likely(!m))
> + return;
> +
> + /* Sort swap entries by swap device, so each lock is only taken once. */
> + sort(entries + m, n - m, sizeof(entries[0]), swp_entry_cmp, NULL);
> + prev = NULL;
> + for (i = m; i < n; i++) {
> + entry = entries[i];
> + if (swp_type(entry) == prev_swp_type)
> + continue;
> + p = swap_info_get_cont(entry, prev);
> + if (likely(p))
> + swap_entry_free(p, entry);
> prev = p;
> }
> if (p)
> --
> 2.11.0
>