On 24/09/26 06:46, Gao Xiang wrote:
Total Size (MiB) Average layer size (MiB) Saved / 766.1MiB
Compressed OCI (tar.gz) 282.5 28.3 63%
Uncompressed OCI (tar) 766.1 76.6 0%
Uncomprssed EROFS 109.5 11.0 86%
EROFS (DEFLATE,9,32k) 46.4 4.6 94%
EROFS (LZ4HC,12,64k) 54.2 5.4 93%
I don't know which compression algorithm are you using (maybe Zstd?),
but from the result is
EROFS (LZ4HC,12,64k) 54.2
PuzzleFS compressed 53?
EROFS (DEFLATE,9,32k) 46.4
I could reran with EROFS + Zstd, but it should be smaller. This feature
has been supported since Linux 6.1, thanks.
The average layer size is very impressive for EROFS, great work.
However, if we multiply the average layer size by 10, we get the total
size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
the average layer size is 30 MIB (for the compressed case), the unified
size is only 53 MiB. So this tells me there's blob sharing between the
different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
with EROFS (what I'm talking about is deduplication across the multiple
versions of Ubuntu Jammy and not within one single version).
Don't make me wrong, I don't think you got the point.
First, what you asked was `I'm referring specifically to this
comment: "EROFS already supports variable-sized chunks + CDC"`,
so I clearly answered with the result of compressed data global
deduplication with CDC.
Here both EROFS and Squashfs compresses 10 Ubuntu images into
one image for fair comparsion to show the benefit of CDC, so
It might be a fair comparison, but that's not how container images are
distributed. You're trying to argue that I should just use EROFS and I'm
showing you that EROFS doesn't currently support the functionality
provided by PuzzleFS: the deduplication across multiple images.
I believe they basically equal to your `Unified size`s, so
the result is
Your unified size
EROFS (LZ4HC,12,64k) 54.2
PuzzleFS compressed 53?
EROFS (DEFLATE,9,32k) 46.4
That is why I used your 53 unified size to show EROFS is much
smaller than PuzzleFS.
The reason why EROFS and SquashFS doesn't have the `Total Size`s
is just because we cannot store every individual chunk into some
seperate file.
Well storing individual chunks into separate files is the entire point
of PuzzleFS.
Currently, I have seen no reason to open arbitary kernel files
(maybe hundreds due to large folio feature at once) in the page
fault context. If I modified `mkfs.erofs` tool, I could give
some similar numbers, but I don't want to waste time now due
to `open arbitary kernel files in the page fault context`.
As I said, if PuzzleFS finally upstream some work to open kernel
files in page fault context, I will definitely work out the same
feature for EROFS soon, but currently I don't do that just
because it's very controversal and no in-tree kernel filesystem
does that.
The PuzzleFS kernel filesystem driver is still in an early POC stage, so
there's still a lot more work to be done.