Re: [PATCH 11/11] btrfs: add zstd compression level support
From: Nikolay Borisov
Date: Tue Jan 29 2019 - 02:26:03 EST
On 28.01.19 Ð. 23:24 Ñ., Dennis Zhou wrote:
> Zstd compression requires different amounts of memory for each level of
> compression. The prior patches implemented indirection to allow for each
> compression type to manage their workspaces independently. This patch
> uses this indirection to implement compression level support for zstd.
>
> As mentioned above, a requirement that differs zstd from zlib is that
> higher levels of compression require more memory. To manage this, each
> compression level has its own queue of workspaces. A global LRU is used
> to help with reclaim. To guarantee forward progress, a max level
> workspace is preallocated and hidden from the LRU.
>
> When getting a workspace, it uses a bitmap to identify the levels that
> are populated and scans up. If it finds a workspace that is greater than
> it, it uses it, but does not update the last_used time and the
> corresponding place in the LRU. This provides a mechanism to decrease
> memory utilization as we only keep around workspaces that are sized
> appropriately for the in use compression levels.
>
> By knowing which compression levels have available workspaces, we can
> recycle rather than always create new workspaces as well as take
> advantage of the preallocated max level for forward progress. If we hit
> memory pressure, we sleep on the max level workspace. We continue to
> rescan in case we can use a smaller workspace, but eventually should be
> able to obtain the max level workspace or allocate one again should
> memory pressure subside. The memory requirement for decompression is the
> same as level 1, and therefore can use any of available workspace.
>
> The number of workspaces is bound by an upper limit of the workqueue's
> limit which currently is 2 (percpu limit). Second, a reclaim timer is
> used to free inactive/improperly sized workspaces. The reclaim timer is
> set to 67s to avoid colliding with transaction commit (every 30s) and
> attempts to reclaim any unused workspace older than 45s.
>
> Repeating the experiment from v2 [1], the Silesia corpus was copied to a
> btrfs filesystem 10 times and then read back after dropping the caches.
> The btrfs filesystem was on an SSD.
>
> Level Ratio Compression (MB/s) Decompression (MB/s)
> 1 2.658 438.47 910.51
> 2 2.744 364.86 886.55
> 3 2.801 336.33 828.41
> 4 2.858 286.71 886.55
> 5 2.916 212.77 556.84
> 6 2.363 119.82 990.85
> 7 3.000 154.06 849.30
> 8 3.011 159.54 875.03
> 9 3.025 100.51 940.15
> 10 3.033 118.97 616.26
> 11 3.036 94.19 802.11
> 12 3.037 73.45 931.49
> 13 3.041 55.17 835.26
> 14 3.087 44.70 716.78
> 15 3.126 37.30 878.84
>
> [1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@xxxxxx/
>
> Signed-off-by: Dennis Zhou <dennis@xxxxxxxxxx>
> Cc: Nick Terrell <terrelln@xxxxxx>
> Cc: Omar Sandoval <osandov@xxxxxx>
> ---
> fs/btrfs/super.c | 6 +-
> fs/btrfs/zstd.c | 229 +++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 226 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index b28dff207383..0ecc513cb56c 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -544,9 +544,13 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
> btrfs_clear_opt(info->mount_opt, NODATASUM);
> btrfs_set_fs_incompat(info, COMPRESS_LZO);
> no_compress = 0;
> - } else if (strcmp(args[0].from, "zstd") == 0) {
> + } else if (strncmp(args[0].from, "zstd", 4) == 0) {
> compress_type = "zstd";
> info->compress_type = BTRFS_COMPRESS_ZSTD;
> + info->compress_level =
> + btrfs_compress_str2level(
> + BTRFS_COMPRESS_ZSTD,
> + args[0].from + 4);
> btrfs_set_opt(info->mount_opt, COMPRESS);
> btrfs_clear_opt(info->mount_opt, NODATACOW);
> btrfs_clear_opt(info->mount_opt, NODATASUM);
> diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
> index a951d4fe77f7..ce9b466c197f 100644
> --- a/fs/btrfs/zstd.c
> +++ b/fs/btrfs/zstd.c
> @@ -6,20 +6,27 @@
> */
>
> #include <linux/bio.h>
> +#include <linux/bitmap.h>
> #include <linux/err.h>
> #include <linux/init.h>
> #include <linux/kernel.h>
> #include <linux/mm.h>
> +#include <linux/sched/mm.h>
> #include <linux/pagemap.h>
> #include <linux/refcount.h>
> #include <linux/sched.h>
> #include <linux/slab.h>
> #include <linux/zstd.h>
> #include "compression.h"
> +#include "ctree.h"
>
> #define ZSTD_BTRFS_MAX_WINDOWLOG 17
> #define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG)
> #define ZSTD_BTRFS_DEFAULT_LEVEL 3
> +#define ZSTD_BTRFS_MAX_LEVEL 15
> +#define ZSTD_BTRFS_RECLAIM_NS (45 * NSEC_PER_SEC)
> +/* 67s to avoid clashing with transaction commit (every 30s) */
> +#define ZSTD_BTRFS_RECLAIM_JIFFIES (67 * HZ)
This is valid provided that transaction commit time is not overriden by
Opt_commit_interval. If it is such a problem to not clash with trans
commit maybe this should be calculated upon mount?
<snip>