Re: [RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE
From: Minchan Kim
Date: Sun Feb 21 2016 - 19:25:46 EST
On Sun, Feb 21, 2016 at 10:27:54PM +0900, Sergey Senozhatsky wrote:
> From: Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx>
>
> The existing limit of max 4 pages per zspage sets a tight limit
> on ->huge classes, which results in increased memory consumption.
>
> On x86_64, PAGE_SHIFT 12, ->huge class_size range is 3280-4096.
> The problem with ->huge classes is that in most of the cases they
> waste memory, because each ->huge zspage has only one order-0 page
> and can store only one object.
>
> For instance, we store 3408 bytes objects as PAGE_SIZE objects,
> while in fact each of those objects has 4096 - 3408 = 688 bytes
> of spare space, so we need to store 5 objects to have enough spare
> space to save the 6th objects with out requesting a new order-0 page.
> In general, turning a ->huge class into a normal will save PAGE_SIZE
> bytes every time "PAGE_SIZE/(PAGE_SIZE - CLASS_SIZE)"-th object is
> stored.
>
> The maximum number of order-0 pages in zspages is limited by
> ZS_MAX_ZSPAGE_ORDER (zspage can consist of up to 1<<ZS_MAX_ZSPAGE_ORDER
> pages). Increasing ZS_MAX_ZSPAGE_ORDER permits us to have less ->huge
> classes, because some of them now can form a 'normal' zspage consisting
> of several order-0 pages.
>
> We can't increase ZS_MAX_ZSPAGE_ORDER on every platform: 32-bit
> PAE/LPAE and PAGE_SHIFT 16 kernels don't have enough bits left in
> OBJ_INDEX_BITS. Other than that, we can increase ZS_MAX_ZSPAGE_ORDER
> to 4. This will change the ->huge classes range (on PAGE_SHIFT 12
> systems) from 3280-4096 to 3856-4096. This will increase density
> and reduce memory wastage/usage.
I tempted it several times with same reason you pointed out.
But my worry was that if we increase ZS_MAX_ZSPAGE_ORDER, zram can
consume more memory because we need several pages chain to populate
just a object. Even, at that time, we didn't have compaction scheme
so fragmentation of object in zspage is huge pain to waste memory.
Now, we have compaction facility so fragment of object might not
be a severe problem but still painful to allocate 16 pages to store
3408 byte. So, if we want to increase ZS_MAX_ZSPAGE_ORDER,
first of all, we should prepare dynamic creating of sub-page of
zspage, I think and more smart compaction to minimize wasted memory.
>
> TESTS (ZS_MAX_ZSPAGE_ORDER 4)
> =============================
>
> showing only bottom of /sys/kernel/debug/zsmalloc/zram0/classes
>
> class size almost_full almost_empty obj_allocated obj_used pages_used
> ========================================================================
>
> 1) compile glibc -j8
>
> BASE
> ...
> 168 2720 0 14 4500 4479 3000
> 190 3072 0 15 3016 2986 2262
> 202 3264 2 2 70 61 56
> 254 4096 0 0 40213 40213 40213
>
> Total 63 247 155676 153957 74955
>
> PATCHED
> ...
> 191 3088 1 1 130 116 100
> 192 3104 1 1 119 103 91
> 194 3136 1 1 260 254 200
> 197 3184 0 3 522 503 406
> 199 3216 2 3 350 320 275
> 200 3232 0 2 114 93 90
> 202 3264 2 2 210 202 168
> 206 3328 1 5 464 418 377
> 207 3344 1 2 121 108 99
> 208 3360 0 3 153 119 126
> 211 3408 2 4 360 341 300
> 212 3424 1 2 133 112 112
> 214 3456 0 2 182 170 154
> 217 3504 0 4 217 200 186
> 219 3536 0 3 135 108 117
> 222 3584 0 3 144 132 126
> 223 3600 1 1 51 35 45
> 225 3632 1 2 108 99 96
> 228 3680 0 2 140 129 126
> 230 3712 0 3 110 94 100
> 232 3744 1 2 132 113 121
> 234 3776 1 2 143 128 132
> 235 3792 0 3 112 81 104
> 236 3808 0 2 75 62 70
> 238 3840 0 2 112 91 105
> 254 4096 0 0 36112 36112 36112
>
> Total 127 228 158342 154050 73884
>
> == Consumed 74955-73884 = 1071 less order-0 pages.
>
> 2) copy linux-next directory (with object files, 2.5G)
>
> BASE
> ...
> 190 3072 0 1 9092 9091 6819
> 202 3264 0 0 240 240 192
> 254 4096 0 0 360304 360304 360304
>
> Total 34 83 687545 686443 480962
>
> PATCHED
> ...
> 191 3088 0 1 455 449 350
> 192 3104 1 0 425 421 325
> 194 3136 1 0 936 935 720
> 197 3184 0 1 1539 1532 1197
> 199 3216 0 1 1148 1142 902
> 200 3232 0 1 570 560 450
> 202 3264 1 0 1245 1244 996
> 206 3328 0 1 2896 2887 2353
> 207 3344 0 0 825 825 675
> 208 3360 0 1 850 845 700
> 211 3408 0 1 2694 2692 2245
> 212 3424 0 1 931 922 784
> 214 3456 1 0 1924 1923 1628
> 217 3504 0 0 2968 2968 2544
> 219 3536 0 1 2220 2209 1924
> 222 3584 0 1 3120 3114 2730
> 223 3600 0 1 1088 1081 960
> 225 3632 0 1 2133 2130 1896
> 228 3680 0 1 3340 3334 3006
> 230 3712 0 1 2035 2025 1850
> 232 3744 0 1 1980 1972 1815
> 234 3776 0 1 2015 2009 1860
> 235 3792 0 1 1022 1013 949
> 236 3808 1 0 960 958 896
> 238 3840 0 0 1968 1968 1845
> 254 4096 0 0 319370 319370 319370
>
> Total 71 137 687877 684436 471265
>
> Consumed 480962 - 471265 = 9697 less order-0 pages.
>
> 3) Run a test script (storing text files of various sizes, binary files
> of various sizes)
>
> cat /sys/block/zram0/mm_stat column 3 is zs_get_total_pages() << PAGE_SHIFT
>
> BASE
> 614477824 425627436 436678656 0 436678656 539608 0 1
> 614526976 425709397 436813824 0 436813824 539580 0 1
> 614502400 425694649 436719616 0 436719616 539585 0 1
> 614510592 425658934 436723712 0 436723712 539583 0 1
> 614477824 425685915 436740096 0 436740096 539589 0 1
>
> PATCHED
> 614543360 387655040 395124736 0 395124736 539577 0 1
> 614445056 387667599 395206656 0 395206656 539614 0 1
> 614477824 387686121 395059200 0 395059200 539589 0 1
> 614461440 387748115 395075584 0 395075584 539592 0 1
> 614486016 387670405 395022336 0 395022336 539588 0 1
>
> == Consumed around 39MB less memory.
>
> P.S. on x86_64, minimum LZO compressed buffer size seems to be around 44
> bytes. zsmalloc adds ZS_HANDLE_SIZE (sizeof(unsigned long)) to the object's
> size in zs_malloc(). Thus, 32 bytes and 48 bytes classes are unreachable by
> LZO on x86_64 PAGE_SHIFT 12 platforms. LZ4, however, seems to have a minimum
> compressed buffer size around 26 bytes. So, once again, on x86_64, 32 bytes
> class is unreachable, but we need to keep 48 bytes size class. In he worst
> case, in theory, if we ever run out of bits in OBJ_INDEX_BITS we can drop 32
> bytes and (well, with some consideration) 48 bytes classes, IOW, do
> ZS_MIN_ALLOC_SIZE << 1.
>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
> ---
> mm/zsmalloc.c | 29 ++++++++++++++++++++++-------
> 1 file changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index e7f10bd..ab9ed8f 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -73,13 +73,6 @@
> */
> #define ZS_ALIGN 8
>
> -/*
> - * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
> - * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
> - */
> -#define ZS_MAX_ZSPAGE_ORDER 2
> -#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
> -
> #define ZS_HANDLE_SIZE (sizeof(unsigned long))
>
> /*
> @@ -96,6 +89,7 @@
> #ifndef MAX_PHYSMEM_BITS
> #ifdef CONFIG_HIGHMEM64G
> #define MAX_PHYSMEM_BITS 36
> +#define ZS_MAX_ZSPAGE_ORDER 2
> #else /* !CONFIG_HIGHMEM64G */
> /*
> * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
> @@ -104,9 +98,30 @@
> #define MAX_PHYSMEM_BITS BITS_PER_LONG
> #endif
> #endif
> +
> #define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT)
>
> /*
> + * We don't have enough bits in OBJ_INDEX_BITS on HIGHMEM64G and
> + * PAGE_SHIFT 16 systems to have huge ZS_MAX_ZSPAGE_ORDER there.
> + * This will significantly increase ZS_MIN_ALLOC_SIZE and drop a
> + * number of important (frequently used in general) size classes.
> + */
> +#if PAGE_SHIFT > 14
> +#define ZS_MAX_ZSPAGE_ORDER 2
> +#endif
> +
> +#ifndef ZS_MAX_ZSPAGE_ORDER
> +#define ZS_MAX_ZSPAGE_ORDER 4
> +#endif
> +
> +/*
> + * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
> + * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
> + */
> +#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
> +
> +/*
> * Memory for allocating for handle keeps object position by
> * encoding <page, obj_idx> and the encoded value has a room
> * in least bit(ie, look at obj_to_location).
> --
> 2.7.1
>