[PATCH 0/1] cover-letter/lz4: Implement lz4 with dynamic offset length.
From: Maninder Singh
Date: Wed Mar 21 2018 - 00:43:13 EST
(Added cover letter to avoid much text in patch description)
LZ4 specification defines 2 byte offset length for 64 KB data.
But in case of ZRAM we compress data per page and in most of
architecture PAGE_SIZE is 4KB. So we can decide offset length based
on actual offset value. For this we can reserve 1 bit to decide offset
length (1 byte or 2 byte). 2 byte required only if ofsset is greater than 127,
else 1 byte is enough.
With this new implementation new offset value can be at MAX 32 KB.
Thus we can save more memory for compressed data.
results checked with new implementation:-
comression size for same input source
(LZ4_DYN < LZO < LZ4)
LZO
=======
orig_data_size: 78917632
compr_data_size: 15894668
mem_used_total: 17117184
LZ4
========
orig_data_size: 78917632
compr_data_size: 16310717
mem_used_total: 17592320
LZ4_DYN
=======
orig_data_size: 78917632
compr_data_size: 15520506
mem_used_total: 16748544
checked performance with below tool:-
https://github.com/sergey-senozhatsky/zram-perf-test
# ./fio-perf-o-meter.sh /tmp/test-fio-zram-lz4 /tmp/test-fio-zram-lz4_dyn
Processing /tmp/test-fio-zram-lz4
Processing /tmp/test-fio-zram-lz4_dyn
#jobs1
WRITE: 1101.7MB/s 1197.7MB/s
WRITE: 799829KB/s 900838KB/s
READ: 2670.2MB/s 2649.5MB/s
READ: 2027.8MB/s 2039.9MB/s
READ: 603703KB/s 597855KB/s
WRITE: 602943KB/s 597103KB/s
READ: 680438KB/s 707986KB/s
WRITE: 679582KB/s 707095KB/s
#jobs2
WRITE: 1993.2MB/s 2121.2MB/s
WRITE: 1654.1MB/s 1700.2MB/s
READ: 5038.2MB/s 4970.9MB/s
READ: 3930.1MB/s 3908.5MB/s
READ: 1113.2MB/s 1117.4MB/s
WRITE: 1111.8MB/s 1115.2MB/s
READ: 1255.8MB/s 1286.5MB/s
WRITE: 1254.2MB/s 1284.9MB/s
#jobs3
WRITE: 2875.6MB/s 3010.3MB/s
WRITE: 2394.4MB/s 2363.2MB/s
READ: 7384.7MB/s 7314.3MB/s
READ: 5389.5MB/s 5427.6MB/s
READ: 1570.8MB/s 1557.3MB/s
WRITE: 1568.8MB/s 1555.3MB/s
READ: 1848.5MB/s 1854.0MB/s
WRITE: 1846.2MB/s 1851.7MB/s
#jobs4
WRITE: 3720.3MB/s 3077.4MB/s
WRITE: 3027.4MB/s 3072.8MB/s
READ: 9694.7MB/s 9822.6MB/s
READ: 6606.5MB/s 6617.2MB/s
READ: 1941.6MB/s 1966.8MB/s
WRITE: 1939.1MB/s 1964.3MB/s
READ: 2405.3MB/s 2347.5MB/s
WRITE: 2402.3MB/s 2344.5MB/s
#jobs5
WRITE: 3335.6MB/s 3360.7MB/s
WRITE: 2670.2MB/s 2677.9MB/s
READ: 9455.3MB/s 8782.2MB/s
READ: 6534.8MB/s 6501.7MB/s
READ: 1848.9MB/s 1858.3MB/s
WRITE: 1846.6MB/s 1855.1MB/s
READ: 2232.4MB/s 2223.7MB/s
WRITE: 2229.6MB/s 2220.9MB/s
#jobs6
WRITE: 3896.5MB/s 3772.9MB/s
WRITE: 3171.1MB/s 3109.4MB/s
READ: 11060MB/s 11120MB/s
READ: 7375.8MB/s 7384.7MB/s
READ: 2132.5MB/s 2133.1MB/s
WRITE: 2129.8MB/s 2131.3MB/s
READ: 2608.4MB/s 2627.3MB/s
WRITE: 2605.7MB/s 2623.2MB/s
#jobs7
WRITE: 4129.4MB/s 4083.2MB/s
WRITE: 3364.5MB/s 3384.4MB/s
READ: 12088MB/s 11062MB/s
READ: 7868.3MB/s 7851.5MB/s
READ: 2277.8MB/s 2291.6MB/s
WRITE: 2274.9MB/s 2288.7MB/s
READ: 2798.5MB/s 2890.1MB/s
WRITE: 2794.1MB/s 2887.4MB/s
#jobs8
WRITE: 4623.3MB/s 4794.9MB/s
WRITE: 3749.3MB/s 3676.9MB/s
READ: 12337MB/s 14076MB/s
READ: 8320.1MB/s 8229.4MB/s
READ: 2496.9MB/s 2486.3MB/s
WRITE: 2493.8MB/s 2483.2MB/s
READ: 3340.4MB/s 3370.6MB/s
WRITE: 3336.2MB/s 3366.4MB/s
#jobs9
WRITE: 4427.6MB/s 4341.3MB/s
WRITE: 3542.6MB/s 3597.2MB/s
READ: 10094MB/s 9888.5MB/s
READ: 7863.5MB/s 8119.9MB/s
READ: 2357.1MB/s 2382.1MB/s
WRITE: 2354.1MB/s 2379.1MB/s
READ: 2828.8MB/s 2826.2MB/s
WRITE: 2825.3MB/s 2822.7MB/s
#jobs10
WRITE: 4463.9MB/s 4327.7MB/s
WRITE: 3637.7MB/s 3592.4MB/s
READ: 10020MB/s 11118MB/s
READ: 7837.8MB/s 8098.7MB/s
READ: 2459.6MB/s 2406.5MB/s
WRITE: 2456.5MB/s 2403.4MB/s
READ: 2804.2MB/s 2829.8MB/s
WRITE: 2800.7MB/s 2826.2MB/s
jobs1 perfstat
stalled-cycles-frontend 20,23,52,25,317 ( 54.32%) 19,29,10,49,608 ( 54.50%)
instructions 44,62,30,88,401 ( 1.20) 42,50,67,71,907 ( 1.20)
branches 7,12,44,77,233 ( 738.975) 6,64,52,15,491 ( 725.584)
branch-misses 2,38,66,520 ( 0.33%) 2,04,33,819 ( 0.31%)
jobs2 perfstat
stalled-cycles-frontend 42,82,90,69,149 ( 56.63%) 41,58,70,01,387 ( 56.01%)
instructions 85,33,18,31,411 ( 1.13) 85,32,92,28,973 ( 1.15)
branches 13,35,34,99,713 ( 677.499) 13,34,97,00,453 ( 693.104)
branch-misses 4,50,17,075 ( 0.34%) 4,47,28,378 ( 0.34%)
jobs3 perfstat
stalled-cycles-frontend 66,01,57,23,062 ( 57.10%) 65,86,74,97,814 ( 57.30%)
instructions 1,28,18,27,80,041 ( 1.11) 1,28,04,92,91,306 ( 1.11)
branches 20,06,14,16,000 ( 651.453) 20,02,85,32,864 ( 652.536)
branch-misses 7,10,66,773 ( 0.35%) 7,12,75,728 ( 0.36%)
jobs4 perfstat
stalled-cycles-frontend 91,98,71,83,315 ( 58.09%) 93,70,91,50,920 ( 58.66%)
instructions 1,70,82,79,66,403 ( 1.08) 1,71,18,67,74,366 ( 1.07)
branches 26,73,53,03,398 ( 621.532) 26,80,89,38,054 ( 618.718)
branch-misses 9,82,07,177 ( 0.37%) 9,81,64,098 ( 0.37%)
jobs5 perfstat
stalled-cycles-frontend 1,47,29,71,29,605 ( 63.59%) 1,47,91,01,92,835 ( 63.86%)
instructions 2,18,90,41,63,988 ( 0.95) 2,18,55,73,09,594 ( 0.94)
branches 34,64,46,32,880 ( 553.209) 34,55,08,02,781 ( 551.953)
branch-misses 14,16,79,279 ( 0.41%) 13,84,85,054 ( 0.40%)
jobs6 perfstat
stalled-cycles-frontend 2,02,92,92,98,242 ( 66.70%) 2,05,33,49,39,627 ( 67.01%)
instructions 2,65,13,90,22,217 ( 0.87) 2,64,84,45,49,149 ( 0.86)
branches 42,11,54,07,400 ( 510.085) 42,03,58,57,789 ( 505.746)
branch-misses 17,71,33,628 ( 0.42%) 17,74,31,942 ( 0.42%)
jobs7 perfstat
stalled-cycles-frontend 2,79,22,74,37,283 ( 70.23%) 2,80,02,50,89,154 ( 70.48%)
instructions 3,11,90,38,02,741 ( 0.78) 3,09,20,69,87,835 ( 0.78)
branches 49,71,39,90,321 ( 460.940) 49,10,44,23,983 ( 455.686)
branch-misses 22,43,84,102 ( 0.45%) 21,96,67,440 ( 0.45%)
jobs8 perfstat
stalled-cycles-frontend 3,59,62,09,66,766 ( 73.38%) 3,58,04,85,16,351 ( 73.37%)
instructions 3,43,83,05,02,841 ( 0.70) 3,43,33,76,84,985 ( 0.70)
branches 54,02,15,25,784 ( 406.256) 53,91,13,38,774 ( 407.265)
branch-misses 25,20,35,507 ( 0.47%) 25,05,71,030 ( 0.46%)
jobs9 perfstat
stalled-cycles-frontend 4,15,33,64,48,628 ( 73.76%) 4,22,88,52,47,923 ( 74.16%)
instructions 3,90,79,09,16,552 ( 0.69) 3,91,12,92,41,516 ( 0.69)
branches 61,66,87,76,271 ( 403.896) 61,73,58,17,174 ( 399.363)
branch-misses 28,46,21,136 ( 0.46%) 28,45,74,774 ( 0.46%)
jobs10 perfstat
stalled-cycles-frontend 4,74,43,71,32,846 ( 74.30%) 4,66,34,70,59,452 ( 73.82%)
instructions 4,35,23,51,39,076 ( 0.68) 4,38,48,78,54,987 ( 0.69)
branches 68,72,17,08,212 ( 396.945) 69,48,52,50,280 ( 405.847)
branch-misses 31,73,62,053 ( 0.46%) 32,34,76,102 ( 0.47%)
seconds elapsed 11.470858891 10.862984653
seconds elapsed 11.802220972 11.348959061
seconds elapsed 11.847204652 11.850297919
seconds elapsed 12.352068602 12.853222188
seconds elapsed 16.162715423 16.355883496
seconds elapsed 16.605502317 16.855938732
seconds elapsed 18.108333660 18.108347866
seconds elapsed 18.621296174 18.354183020
seconds elapsed 22.366502860 22.357632546
seconds elapsed 24.362417439 24.363003009
Maninder Singh, Vaneet Narang (1):
lz4: Implement lz4 with dynamic offset (lz4_dyn).
crypto/lz4.c | 64 ++++++++++++++++++++++++++++++++-
drivers/block/zram/zcomp.c | 4 ++
fs/pstore/platform.c | 2 +-
include/linux/lz4.h | 15 ++++++--
lib/decompress_unlz4.c | 2 +-
lib/lz4/lz4_compress.c | 84 +++++++++++++++++++++++++++++++++++--------
lib/lz4/lz4_decompress.c | 56 ++++++++++++++++++++---------
lib/lz4/lz4defs.h | 11 ++++++
8 files changed, 197 insertions(+), 41 deletions(-)