Re: [PATCH] v9fs: handle async processing of F_SETLK with FL_SLEEP flag

From: Vasily Averin
Date: Fri Dec 24 2021 - 02:09:19 EST


On 24.12.2021 02:14, Dominique Martinet wrote:
> Vasily Averin wrote on Thu, Dec 23, 2021 at 09:21:23PM +0300:
>> kernel export thread (nfsd/lockd/ksmbd) uses F_SETLK cmd with the FL_SLEEP
>> flag set to request asynchronous processing of blocking locks.
>>
>> Currently v9fs does not support such requests and calls blocking
>> locks_lock_file_wait() function.
>
> There's two stages to 9p locks: the client first tries to get the lock
> locally on the client, then if it was obtained locally also tries to get
> it on the server.
> I believe the servers should just ignores flags like FL_SLEEP they don't
> know about, so we need to translate it as well if required.
>
>> To work around the problem let's detect such request, drop FL_SLEEP
>> before execution of potentially blocking functions.
>
> I'm not up to date with lock mechanisms, could you confirm I understand
> the flags right?
> - F_SETLK: tries to lock, on conflict return immediately with error
> - F_SETLKW|FL_SLEEP: tries to lock, on conflict wait for lock to become available
> - F_SETLKW: not possible through flock/fcntl setlk, can happen otherwise?
> but for 9p purpose same as above.
> - F_SETLK|FL_SLEEP: tries to lock, on conflict ????? you'd want it to
> return immediately but setup some callback to be woken up? how could
> that work without passing some wake up struct? or just behave as plain
> F_SETLK? but then FL_SLEEP has no purpose, I don't get it.

I apologize in advance for the long answer, but I tried to state all the details
of the detected problem.

Below is description copy-pasted from comment above vfs_lock_file()
"
* To avoid blocking kernel daemons, such as lockd, that need to acquire POSIX
* locks, the ->lock() interface may return asynchronously, before the lock has
* been granted or denied by the underlying filesystem, if (and only if)
* lm_grant is set. Callers expecting ->lock() to return asynchronously
* will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if)
* the request is for a blocking lock. When ->lock() does return asynchronously,
* it must return FILE_LOCK_DEFERRED, and call ->lm_grant() when the lock
* request completes.
* If the request is for non-blocking lock the file system should return
* FILE_LOCK_DEFERRED then try to get the lock and call the callback routine
* with the result. If the request timed out the callback routine will return a
* nonzero return code and the file system should release the lock. The file
* system is also responsible to keep a corresponding posix lock when it
* grants a lock so the VFS can find out which locks are locally held and do
* the correct lock cleanup when required.
"

lockd used by nfs v2 and v3 was first kernel daemon processed locking requests.
For nfs v4 locking requests are handled by nfsd thread. Recently ksmbd was added
into kernel, it also handles locking requests.

They all are servers, and they can receive blocking lock requests from own clients.
They all cannot process such requests synchronously because it causes server deadlock.
In simplest form, if single threaded nfsd is blocked on processing such request,
whole nfs server cannot process any other commands.

To avoid this deadlock, kernel threads are forced to use F_SETLK with FL_SLEEP,
to ask callers to process such request asynchronously or if it is impossible to
do not block such requests.

Most part of file systems do not have own ->lock functions, in this case vfs_lock_file()
calls posix_lock_file(), which correctly handles such requests.

However some other file systems, i.e. ones defined own ->lock function,
incorrectly handles such requests and can block on its processing.
For example v9fs calls blocking locks_lock_file_wait() function.

It is not "pure theoretic" issue.
One of our customers tried to export fuse/glusters via nfsd and reported about
memory corruption in nfsd4_lock() function. We found few related bugs in nfsd,
however finally we noticed that fuse blocks on processing such requests.
During investigation we have found that fuse just ignored F_SETLK command,
handled FL_SLEEP flag and submitted blocking FUSE_SETLKW command.
Similar troubles was found in some other file systems defines own ->lock() function.

To work around the problem nfsd maintainer decided to drop FL_SLEEP flag if
exported file system defines own ->lock() function.

Then I've found the problem affects recently added ksmbd and patched it by the same way.

To resolve this problem completely I've created https://bugzilla.kernel.org/show_bug.cgi?id=215383
where I'm trying to handle the problem in all affected file systems.
When this will be done kernel export threads can revert own work around with dropped FL_SLEEP flag.

Answering on you question: it's ok to ignore of FL_SLEEP flag for F_SETLK command,
It would be even better to use posix_lock_file() instead of locks_lock_file_wait(),
but I cannot do it without your assistance.

Thank you,
Vasily Averin