[RFC][PATCH v2 3/3] mm/zsmalloc: increase ZS_MAX_PAGES_PER_ZSPAGE

From: Sergey Senozhatsky
Date: Sun Feb 21 2016 - 13:59:30 EST


From: Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx>

The existing limit of max 4 pages per zspage sets a tight limit
on ->huge classes, which results in increased memory consumption.

On x86_64, PAGE_SHIFT 12, ->huge class_size range is 3280-4096.
The problem with ->huge classes is that in most of the cases they
waste memory, because each ->huge zspage has only one order-0 page
and can store only one object.

For instance, we store 3408 bytes objects as PAGE_SIZE objects,
while in fact each of those objects has 4096 - 3408 = 688 bytes
of spare space, so we need to store 5 objects to have enough spare
space to save the 6th objects with out requesting a new order-0 page.
In general, turning a ->huge class into a normal will save PAGE_SIZE
bytes every time "PAGE_SIZE/(PAGE_SIZE - CLASS_SIZE)"-th object is
stored.

The maximum number of order-0 pages in zspages is limited by
ZS_MAX_ZSPAGE_ORDER (zspage can consist of up to 1<<ZS_MAX_ZSPAGE_ORDER
pages). Increasing ZS_MAX_ZSPAGE_ORDER permits us to have less ->huge
classes, because some of them now can form a 'normal' zspage consisting
of several order-0 pages.

We can't increase ZS_MAX_ZSPAGE_ORDER on every platform: 32-bit
PAE/LPAE and PAGE_SHIFT 16 kernels don't have enough bits left in
OBJ_INDEX_BITS. Other than that, we can increase ZS_MAX_ZSPAGE_ORDER
to 4. This will change the ->huge classes range (on PAGE_SHIFT 12
systems) from 3280-4096 to 3856-4096. This will increase density
and reduce memory wastage/usage.

TESTS (ZS_MAX_ZSPAGE_ORDER 4)
=============================

showing only bottom of /sys/kernel/debug/zsmalloc/zram0/classes

class size almost_full almost_empty obj_allocated obj_used pages_used
========================================================================

1) compile glibc -j8

BASE
...
168 2720 0 14 4500 4479 3000
190 3072 0 15 3016 2986 2262
202 3264 2 2 70 61 56
254 4096 0 0 40213 40213 40213

Total 63 247 155676 153957 74955

PATCHED
...
191 3088 1 1 130 116 100
192 3104 1 1 119 103 91
194 3136 1 1 260 254 200
197 3184 0 3 522 503 406
199 3216 2 3 350 320 275
200 3232 0 2 114 93 90
202 3264 2 2 210 202 168
206 3328 1 5 464 418 377
207 3344 1 2 121 108 99
208 3360 0 3 153 119 126
211 3408 2 4 360 341 300
212 3424 1 2 133 112 112
214 3456 0 2 182 170 154
217 3504 0 4 217 200 186
219 3536 0 3 135 108 117
222 3584 0 3 144 132 126
223 3600 1 1 51 35 45
225 3632 1 2 108 99 96
228 3680 0 2 140 129 126
230 3712 0 3 110 94 100
232 3744 1 2 132 113 121
234 3776 1 2 143 128 132
235 3792 0 3 112 81 104
236 3808 0 2 75 62 70
238 3840 0 2 112 91 105
254 4096 0 0 36112 36112 36112

Total 127 228 158342 154050 73884

== Consumed 74955-73884 = 1071 less order-0 pages.

2) copy linux-next directory (with object files, 2.5G)

BASE
...
190 3072 0 1 9092 9091 6819
202 3264 0 0 240 240 192
254 4096 0 0 360304 360304 360304

Total 34 83 687545 686443 480962

PATCHED
...
191 3088 0 1 455 449 350
192 3104 1 0 425 421 325
194 3136 1 0 936 935 720
197 3184 0 1 1539 1532 1197
199 3216 0 1 1148 1142 902
200 3232 0 1 570 560 450
202 3264 1 0 1245 1244 996
206 3328 0 1 2896 2887 2353
207 3344 0 0 825 825 675
208 3360 0 1 850 845 700
211 3408 0 1 2694 2692 2245
212 3424 0 1 931 922 784
214 3456 1 0 1924 1923 1628
217 3504 0 0 2968 2968 2544
219 3536 0 1 2220 2209 1924
222 3584 0 1 3120 3114 2730
223 3600 0 1 1088 1081 960
225 3632 0 1 2133 2130 1896
228 3680 0 1 3340 3334 3006
230 3712 0 1 2035 2025 1850
232 3744 0 1 1980 1972 1815
234 3776 0 1 2015 2009 1860
235 3792 0 1 1022 1013 949
236 3808 1 0 960 958 896
238 3840 0 0 1968 1968 1845
254 4096 0 0 319370 319370 319370

Total 71 137 687877 684436 471265

Consumed 480962 - 471265 = 9697 less order-0 pages.

3) Run a test script (storing text files of various sizes, binary files
of various sizes)

cat /sys/block/zram0/mm_stat column 3 is zs_get_total_pages() << PAGE_SHIFT

BASE
614477824 425627436 436678656 0 436678656 539608 0 1
614526976 425709397 436813824 0 436813824 539580 0 1
614502400 425694649 436719616 0 436719616 539585 0 1
614510592 425658934 436723712 0 436723712 539583 0 1
614477824 425685915 436740096 0 436740096 539589 0 1

PATCHED
614543360 387655040 395124736 0 395124736 539577 0 1
614445056 387667599 395206656 0 395206656 539614 0 1
614477824 387686121 395059200 0 395059200 539589 0 1
614461440 387748115 395075584 0 395075584 539592 0 1
614486016 387670405 395022336 0 395022336 539588 0 1

== Consumed around 39MB less memory.

P.S. on x86_64, minimum LZO compressed buffer size seems to be around 44
bytes. zsmalloc adds ZS_HANDLE_SIZE (sizeof(unsigned long)) to the object's
size in zs_malloc(). Thus, 32 bytes and 48 bytes classes are unreachable by
LZO on x86_64 PAGE_SHIFT 12 platforms. LZ4, however, seems to have a minimum
compressed buffer size around 26 bytes. So, once again, on x86_64, 32 bytes
class is unreachable, but we need to keep 48 bytes size class. In he worst
case, in theory, if we ever run out of bits in OBJ_INDEX_BITS we can drop 32
bytes and (well, with some consideration) 48 bytes classes, IOW, do
ZS_MIN_ALLOC_SIZE << 1.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
---
mm/zsmalloc.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e7f10bd..ab9ed8f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -73,13 +73,6 @@
*/
#define ZS_ALIGN 8

-/*
- * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
- * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
- */
-#define ZS_MAX_ZSPAGE_ORDER 2
-#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
-
#define ZS_HANDLE_SIZE (sizeof(unsigned long))

/*
@@ -96,6 +89,7 @@
#ifndef MAX_PHYSMEM_BITS
#ifdef CONFIG_HIGHMEM64G
#define MAX_PHYSMEM_BITS 36
+#define ZS_MAX_ZSPAGE_ORDER 2
#else /* !CONFIG_HIGHMEM64G */
/*
* If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
@@ -104,9 +98,30 @@
#define MAX_PHYSMEM_BITS BITS_PER_LONG
#endif
#endif
+
#define _PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT)

/*
+ * We don't have enough bits in OBJ_INDEX_BITS on HIGHMEM64G and
+ * PAGE_SHIFT 16 systems to have huge ZS_MAX_ZSPAGE_ORDER there.
+ * This will significantly increase ZS_MIN_ALLOC_SIZE and drop a
+ * number of important (frequently used in general) size classes.
+ */
+#if PAGE_SHIFT > 14
+#define ZS_MAX_ZSPAGE_ORDER 2
+#endif
+
+#ifndef ZS_MAX_ZSPAGE_ORDER
+#define ZS_MAX_ZSPAGE_ORDER 4
+#endif
+
+/*
+ * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
+ * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
+ */
+#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
+
+/*
* Memory for allocating for handle keeps object position by
* encoding <page, obj_idx> and the encoded value has a room
* in least bit(ie, look at obj_to_location).
--
2.7.1