Re: [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING
From: Eric W. Biederman
Date: Fri Jul 29 2022 - 12:18:31 EST
Tycho Andersen <tycho@tycho.pizza> writes:
> On Fri, Jul 29, 2022 at 12:04:17AM -0500, Eric W. Biederman wrote:
>> Tycho Andersen <tycho@tycho.pizza> writes:
>>
>> > On Thu, Jul 28, 2022 at 11:12:20AM +0200, Oleg Nesterov wrote:
>
>> >> Finally. if fuse_flush() wants __fatal_signal_pending() == T when the
>> >> caller exits, perhaps it can do it itself? Something like
>> >>
>> >> if (current->flags & PF_EXITING) {
>> >> spin_lock_irq(siglock);
>> >> set_thread_flag(TIF_SIGPENDING);
>> >> sigaddset(¤t->pending.signal, SIGKILL);
>> >> spin_unlock_irq(siglock);
>> >> }
>> >>
>> >> Sure, this is ugly as hell. But perhaps this can serve as a workaround?
>> >
>> > or even just
>> >
>> > if (current->flags & PF_EXITING)
>> > return;
>> >
>> > since we don't have anyone to send the result of the flush to anyway.
>> > If we don't end up converging on a fix here, I'll just send that
>> > patch. Thanks for the suggestion.
>>
>> If that was limited to the case you care about that would be reasonable.
>>
>> That will have an effect on any time a process that opens files on a
>> fuse filesystem exits and depends upon the exit path to close it's file
>> descriptors to the fuse filesystem.
>>
>>
>> I do see a plausible solution along those lines.
>>
>> In fuse_flush instead of using fuse_simple_request call an equivalent
>> function that when PF_EXITING is true skips calling request_wait_answer.
>> Or perhaps when PF_EXITING is set uses schedule_work to call the
>> request_wait_answer.
>
> I don't see why this is any different than what I proposed. It changes
> the semantics to flush happening out-of-order with task exit, instead
> of strictly before, which you point out might be a problem. What am I
> missing?
What you proposed skips the flush operation entirely. Which means
that a fuse server that tracks opens and closes of a file descriptor
will see more opens than closes and will have a reference counting
problem (probably resulting in things not being freed).
Simply skipping the wait for the result from the fuse server means
the fuse server sees what it has always seen. The kernel simply won't
block until that result has been returned. Which means the other file
descriptors can be closed.
For the specific case you are looking at with the server being killed
and server's file descriptors not yet being closed, the difference
does not matter. In the ordinary case of a process exit closing file
descriptors to a fuse filesystem where the server continues to live
and function not waiting for the response from the server simply
winds up being an optimization, in exit. The key part is the fuse
server continues to see the same traffic. In particular the open
requests and the flush requests continue to balance, so reference
counting in the fuse server is not broken.
Eric