Re: [PATCH] ceph: don't return -ESTALE if there's still an open file
From: Amir Goldstein
Date: Tue May 19 2020 - 00:00:45 EST
On Tue, May 19, 2020 at 1:30 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> Maybe we resolved this conversation; I can't quite tell...
I think v2 patch wraps it up...
[...]
> > >
> > > Questions:
> > > 1. Does sync() result in fully purging inodes on MDS?
> >
> > I don't think so, but again, that code is not trivial to follow. I do
> > know that the MDS keeps around a "strays directory" which contains
> > unlinked inodes that are lazily cleaned up. My suspicion is that it's
> > satisfying lookups out of this cache as well.
> >
> > Which may be fine...the MDS is not required to be POSIX compliant after
> > all. Only the fs drivers are.
>
> I don't think this is quite that simple. Yes, the MDS is certainly
> giving back stray inodes in response to a lookup-by-ino request. But
> that's for a specific purpose: we need to be able to give back caps on
> unlinked-but-open files. For NFS specifically, I don't know what the
> rules are on NFS file handles and unlinked files, but the Ceph MDS
> won't know when files are closed everywhere, and it translates from
> NFS fh to Ceph inode using that lookup-by-ino functionality.
>
There is no protocol rule that NFS server MUST return ESTALE
for file handle of a deleted file, but there is a rule that it MAY return
ESTALE for deleted file. For example, on server restart and traditional
block filesystem, there is not much choice.
So returning ESTALE when file is deleted but opened on another ceph
client is definitely allowed by the protocol standard, the question is
whether changing the behavior will break any existing workloads...
> >
> > > 2. Is i_nlink synchronized among nodes on deferred delete?
> > > IWO, can inode come back from the dead on client if another node
> > > has linked it before i_nlink 0 was observed?
> >
> > No, that shouldn't happen. The caps mechanism should ensure that it
> > can't be observed by other clients until after the change.
> >
> > That said, Luis' current patch doesn't ensure we have the correct caps
> > to check the i_nlink. We may need to add that in before we can roll with
> > this.
> >
> > > 3. Can an NFS client be "migrated" from one ceph node to another
> > > with an open but unlinked file?
> > >
> >
> > No. Open files in ceph are generally per-client. You can't pass around a
> > fd (or equivalent).
>
> But the NFS file handles I think do work across clients, right?
>
Maybe they can, but that would be like NFS server restart, so
all bets are off w.r.t open but deleted files.
Thanks,
Amir.