[PATCH v2 00/16] erofs: prepare for folios, deduplication and kill PG_error

From: Gao Xiang
Date: Fri Jul 15 2022 - 11:42:29 EST


Hi folks,

I've been doing this for almost 2 months, the main point of this is
to support large folios and rolling hash deduplication for compressed
data.

This patchset is as a start of this work targeting for the next 5.20,
it introduces a flexable range representation for (de)compressed buffers
instead of too relying on page(s) directly themselves, so large folios
can laterly base on this work. Also, this patchset gets rid of all
PG_error flags in the decompression code. It's a cleanup as a result
as well.

In addition, this patchset kicks off rolling hash deduplication for
compressed data by introducing fully-referenced multi-reference
pclusters first instead of reporting fs corruption if one pcluster
is introduced by several differnt extents. The full implementation
is expected to be finished in the merge window after the next. One
of my colleagues is actively working on the userspace part of this
feature.

However, it's still easy to verify fully-referenced multi-reference
pcluster by constructing some image by hand (see attachment):

Dataset: 300M
seq-read (data-deduplicated, read_ahead_kb 8192): 1095MiB/s
seq-read (data-deduplicated, read_ahead_kb 4096): 771MiB/s
seq-read (data-deduplicated, read_ahead_kb 512): 577MiB/s
seq-read (vanilla, read_ahead_kb 8192): 364MiB/s

Finally, this patchset survives ro-fsstress on my side.

Thanks,
Gao Xiang

Changes since v1:
- rename left pagevec words to bvpage (Yue Hu);

Gao Xiang (16):
erofs: get rid of unneeded `inode', `map' and `sb'
erofs: clean up z_erofs_collector_begin()
erofs: introduce `z_erofs_parse_out_bvecs()'
erofs: introduce bufvec to store decompressed buffers
erofs: drop the old pagevec approach
erofs: introduce `z_erofs_parse_in_bvecs'
erofs: switch compressed_pages[] to bufvec
erofs: rework online page handling
erofs: get rid of `enum z_erofs_page_type'
erofs: clean up `enum z_erofs_collectmode'
erofs: get rid of `z_pagemap_global'
erofs: introduce struct z_erofs_decompress_backend
erofs: try to leave (de)compressed_pages on stack if possible
erofs: introduce z_erofs_do_decompressed_bvec()
erofs: record the longest decompressed size in this round
erofs: introduce multi-reference pclusters (fully-referenced)

fs/erofs/compress.h | 2 +-
fs/erofs/decompressor.c | 2 +-
fs/erofs/zdata.c | 785 +++++++++++++++++++++++-----------------
fs/erofs/zdata.h | 119 +++---
fs/erofs/zpvec.h | 159 --------
5 files changed, 496 insertions(+), 571 deletions(-)
delete mode 100644 fs/erofs/zpvec.h

--
2.24.4