Re: [PATCH] fs/9p: Add new options to Documentation

From: Eric Van Hensbergen
Date: Sun Apr 02 2023 - 14:44:09 EST


On Sun, Apr 2, 2023 at 9:07 AM Christian Schoenebeck
<linux_oss@xxxxxxxxxxxxx> wrote:
>
> +CC: Jeff for experience on this caching issue with NFS ...
>
> On Tuesday, March 28, 2023 5:51:51 PM CEST Eric Van Hensbergen wrote:
> > As I work through the documentation rework and to some extent the
> > testing matrix -- I am reconsidering some choices and wanted to open
> > up the discussion here.
> >
> > TLDR; I'm thinking of reworking the cache options before the merge
> > window to keep things simple while setting up for some of the future
> > options.
>
> Yeah, revising the 9p cache options highly makes sense!
>
> So what is the plan on this now? I saw you sent a new patch with the "old"
> options today? So is this one here deferred?
>

Yes - I decided to just bite the bullet and try and have a new
approach to things that was less confusing and try and reduce the test
matrix.
Unfortunately, I was also switching over to trying to use b4 to manage
these things and didn't see a way to relate the patch to the original
one.
I figured the increase in scope justified a new thread.

> > While we have a bunch of new options, in practice I expect users to
> > probably consolidate around three models: no caching, tight caches,
> > and expiring caches with fscache being an orthogonal add-on to the
> > last two.
>
> Actually as of today I don't know why somebody would want to use fscache
> instead of loose. Does it actually make sense to keep fscache and if yes why?
>

Persistent caches make sense in certain scenarios, it basically gives
a way to lazy populate a local cache of the remote server. I wouldn't
use it in qemu
or a VM, but I'd definitely use it in cluster scenarios. Its also
potentially useful for unstable connections -- but this is more
challenging to deal with in 9p although solutions have been proposed
in the past.

> > The ultimate goal is to simplify the options based on expected use models:
> >
> > - cache=[ none, file, all ] (none is currently default)
>
> dir?
>

That was what I was originally going to do, but 'all' makes the
inclusivity of file explicit. The more I thought about it, the less
I saw a use case for caching meta data and not file data. Of course,
in the new patch series I have a bit for meta-data (which is a bit
more accurate than just saying dir) at least in the current
implementation. However, if I did a shortcut, it probably
would be all and not dir (there isn't one yet in the new patch as I'm
holding off until I fix consistency for meta-data -- which is in
progress and I think imperfect but closer than loose). And actually
once we are convinced this is working, the shortcut for all may be
better represented as 'default' (hopefully).

> > - write_policy = [ *writethrough, writeback ] (writethrough would be default)
> > - cache_validate = [ never, *open, x (seconds) ] (cache_validate
> > would default to open)
>
> Jeff came up with the point that NFS uses a slicing time window for NFS. So
> the question is, would it make sense to add an option x seconds that might be
> dropped soon anyway?
>

I had thought about that, and in fact in the new patch series I
haven't implemented either yet. In the bitmask version I could used
one of the reserved bits to specify adaptive timeouts like Jeff said
NFS does and then the high bits to specify the base timeout or 0 if
you never want to validate (current loose). Loose mode itself
specifies look at the base timeout for how much temporal consistency
to apply.

Looking ahead at implementing this there is going to be ripple effects
through the code, probably changing the way we use the cache_validity
field in inode. But seems straightforward enough.

> > - fscache
> >
> > So, mapping of existing (deprecated) legacy modes:
> > - none (obvious) write_policy=writethrough
> > - *readahead -> cache=file cache_validate_open write_policy=writethrough
> > - mmap -> cache=file cache_validate=open write_policy=writeback
>
> Mmm, why is that "file"? To me "file" sounds like any access to files is
> cached, whereas cache=mmap just uses the cache if mmap() was called, not for
> any other file access.
>

Well, not technically true -- because if the file is mmaped and it is
accessed another way other than the mmap then we need to basically
treat it as cached or we'll get inconsistencies between the mmap
version and the uncached read/write version. So the code ends up
using open-to-close file caching for everything. In any case, mmap
was there for compatibility reasons before we had a non-loose caching
option. With tight caches, its better to just incorporate mmap with
it (given the above concerns on behavior). Otherwise we'd have to
enforce a non-posix semantic of the only way to access an mmaped file
is with mmap and force invalidations of others that might already have
a file open when someonebody else mmaps it...would be messy.

> > - loose -> cache=all cache_validate=never write_policy=writeback
> > - fscache -> cache=all cache_validate=never write_policy=writeback &
> > fscache enabled
> >
> > Some things I'm less certain of: cache_validation is probably an
> > imperfect term as is using 'open' as one of the options, in this case
> > I'm envisioning 'open' to mean open-to-close coherency for file
> > caching (cache is only validated on open) and validation on lookup for
> > dir-cache coherency (using qid.version). Specifying a number here
> > expires existing caches and requires validation after a certain number
> > of seconds (is that the right granularity)?
>
> Personally I would then really call it open-to-close or opentoclose and waste
> some more characters in favour of clarity.
>

I had open2close in the new version and then reverted it to the
inverse of loose because it made checking the bit pattern much easier.
But the plan is for loose to cover no-consistency (when
cache-timeout==0, and temporal consistency when cache-timeout > 0, and
then augment that with Jeff's suggestion of an adaptive bit).
Perhaps the new loose shortcut will cover NFS-like semantics with a
base-timeout of 5s and adaptivity bit set.

-eric