Re: [Linux-cachefs] [PATCH v2 00/20] fscache, erofs: fscache-based demand-read semantics
From: Gao Xiang
Date: Wed Jan 19 2022 - 01:41:22 EST
Hi David,
On Tue, Jan 18, 2022 at 09:11:56PM +0800, Jeffle Xu wrote:
> changes since v1:
> - rebase to v5.17
> - erofs: In chunk based layout, since the logical file offset has the
> same remainder over PAGE_SIZE with the corresponding physical address
> inside the data blob file, the file page cache can be directly
> transferred to netfs library to contain the data from data blob file.
> (patch 15) (Gao Xiang)
> - netfs,cachefiles: manage logical/physical offset separately. (patch 2)
> (It is used by erofs_begin_cache_operation() in patch 15.)
> - cachefiles: introduce a new devnode specificaly for on-demand reading.
> (patch 6)
> - netfs,fscache,cachefiles: add new CONFIG_* for on-demand reading.
> (patch 3/5)
> - You could start a quick test by
> https://github.com/lostjeffle/demand-read-cachefilesd
> - add more background information (mainly introduction to nydus) in the
> "Background" part of this cover letter
>
> [Important Issues]
> The following issues still need further discussion. Thanks for your time
> and patience.
>
> 1. I noticed that there's refactoring of netfs library[1], and patch 1
> is not needed since [2].
>
> 2. The current implementation will severely conflict with the
> refactoring of netfs library[1][2]. The assumption of 'struct
> netfs_i_context' [2] is that, every file in the upper netfs will
> correspond to only one backing file. While in our scenario, one file in
> erofs can correspond to multiple backing files. That is, the content of
> one file can be divided into multiple chunks, and are distrubuted over
> multiple blob files, i.e. multiple backing files. Currently I have no
> good idea solving this conflic.
>
Would you mind give more hints on this? Personally, I still think fscache
is useful and clean way for image distribution on-demand load use cases
in addition to cache network fs data as a more generic in-kernel caching
framework. From the point view of current codestat, it has slight
modification of netfslib and cachefiles (except for a new daemon):
fs/netfs/Kconfig | 8 +
fs/netfs/read_helper.c | 65 ++++++--
include/linux/netfs.h | 10 ++
fs/cachefiles/Kconfig | 8 +
fs/cachefiles/daemon.c | 147 ++++++++++++++++-
fs/cachefiles/internal.h | 23 +++
fs/cachefiles/io.c | 82 +++++++++-
fs/cachefiles/main.c | 27 ++++
fs/cachefiles/namei.c | 60 ++++++-
Besides, I think that cookies can be set according to data mapping
(instead of fixed per file) will benefit the following scenario in
addition to our on-demand load use cases:
It will benefit file cache data deduplication. What I can see is that
netfslib may have some follow-on development in order to support
encryption and compression. However, I think cache data deduplication
is also potentially useful to minimize cache storage since many local
fses already support reflink. However, I'm not sure if it's a great
idea that cachefile relies on underlayfs abilities for cache deduplication.
So for cache deduplication scenarios, I'm not sure per-file cookie is
still a good idea for us (or alternatively, maintain more complicated
mapping per cookie inside fscache besides filesystem mapping, too
unnecessary IMO).
By the way, in general, I'm not sure if it's a great idea to cache in
per-file basis (especially for too many small files), that is why we
introduced data deduplicated blobs. At least, it's simpler for read-only
fses. Recently, I found another good article to summarize this:
http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
Thanks,
Gao Xiang