Re: [PATCH 1/2] Allow a kthread to declare that it calls task_work_run()

From: NeilBrown
Date: Tue Dec 05 2023 - 16:28:31 EST


On Tue, 05 Dec 2023, Christian Brauner wrote:
> On Mon, Dec 04, 2023 at 03:09:44PM -0700, Jens Axboe wrote:
> > On 12/4/23 2:02 PM, NeilBrown wrote:
> > > It isn't clear to me what _GPL is appropriate, but maybe the rules
> > > changed since last I looked..... are there rules?
> > >
> > > My reasoning was that the call is effectively part of the user-space
> > > ABI. A user-space process can call this trivially by invoking any
> > > system call. The user-space ABI is explicitly a boundary which the GPL
> > > does not cross. So it doesn't seem appropriate to prevent non-GPL
> > > kernel code from doing something that non-GPL user-space code can
> > > trivially do.
> >
> > By that reasoning, basically everything in the kernel should be non-GPL
> > marked. And while task_work can get used by the application, it happens
> > only indirectly or implicitly. So I don't think this reasoning is sound
> > at all, it's not an exported ABI or API by itself.
> >
> > For me, the more core of an export it is, the stronger the reason it
> > should be GPL. FWIW, I don't think exporting task_work functionality is
> > a good idea in the first place, but if there's a strong reason to do so,
>
> Yeah, I'm not too fond of that part as well. I don't think we want to
> give modules the ability to mess with task work. This is just asking for
> trouble.
>

Ok, maybe we need to reframe the problem then.

Currently fput(), and hence filp_close(), take control away from kernel
threads in that they cannot be sure that a "close" has actually
completed.

This is already a problem for nfsd. When renaming a file, nfsd needs to
ensure any cached "open" that it has on the file is closed (else when
re-exporting an NFS filesystem it can result in a silly-rename).

nfsd currently handles this case by calling flush_delayed_fput(). I
suspect you are no more happy about exporting that than you are about
exporting task_work_run(), but this solution isn't actually 100%
reliable. If some other thread calls flush_delayed_fput() between nfsd
calling filp_close() and that same nfsd calling flush_delayed_fput(),
then the second flush can return before the first flush (in the other
thread) completes all the work it took on.

What we really need - both for handling renames and for avoiding
possible memory exhaustion - is for nfsd to be able to reliably wait for
any fput() that it initiated to complete.

How would you like the VFS to provide that service?

Thanks,
NeilBrown