Re: [RFC PATCH v1 13/13] mm: vmscan, swap, zswap: Compress batching of folios in shrink_folio_list().
From: Joel Granados
Date: Mon Oct 28 2024 - 10:41:54 EST
On Thu, Oct 17, 2024 at 11:41:01PM -0700, Kanchana P Sridhar wrote:
> This patch enables the use of Intel IAA hardware compression acceleration
> to reclaim a batch of folios in shrink_folio_list(). This results in
> reclaim throughput and workload/sys performance improvements.
>
> The earlier patches on compress batching deployed multiple IAA compress
> engines for compressing up to SWAP_CRYPTO_SUB_BATCH_SIZE pages within a
> large folio that is being stored in zswap_store(). This patch further
> propagates the efficiency improvements demonstrated with IAA "batching
> within folios", to vmscan "batching of folios" which will also use
> batching within folios using the extensible architecture of
> the __zswap_store_batch_core() procedure added earlier, that accepts
> an array of folios.
...
> +static inline void zswap_store_batch(struct swap_in_memory_cache_cb *simc)
> +{
> +}
> +
> static inline bool zswap_store(struct folio *folio)
> {
> return false;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 79e6cb1d5c48..b8d6b599e9ae 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2064,6 +2064,15 @@ static struct ctl_table vm_table[] = {
> .extra1 = SYSCTL_ZERO,
> .extra2 = (void *)&page_cluster_max,
> },
> + {
> + .procname = "compress-batchsize",
> + .data = &compress_batchsize,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
Why not use proc_douintvec_minmax? These are the reasons I think you
should use that (please correct me if I miss-read your patch):
1. Your range is [1,32] -> so no negative values
2. You are using the value to compare with an unsinged int
(simc->nr_folios) in your `struct swap_in_memory_cache_cb`. So
instead of going from int to uint, you should just do uint all
around. No?
3. Using proc_douintvec_minmax will automatically error out on negative
input without event considering your range, so there is less code
executed at the end.
> + .extra1 = SYSCTL_ONE,
> + .extra2 = (void *)&compress_batchsize_max,
> + },
> {
> .procname = "dirtytime_expire_seconds",
> .data = &dirtytime_expire_interval,
> diff --git a/mm/page_io.c b/mm/page_io.c
> index a28d28b6b3ce..065db25309b8 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -226,6 +226,131 @@ static void swap_zeromap_folio_clear(struct folio *folio)
> }
> }
...
Best
--
Joel Granados