Re: [PATCH 1/1] mm: introduce process_reap system call
From: Suren Baghdasaryan
Date: Mon Jul 12 2021 - 14:39:26 EST
On Mon, Jul 12, 2021 at 5:51 AM Jan Engelhardt <jengelh@xxxxxxx> wrote:
>
>
> On Thursday 2021-07-08 08:05, Suren Baghdasaryan wrote:
> >>
> >> That explains very clearly the requirement, but it raises the question
> >> why this isn't an si_code flag for rt_sigqueueinfo, reusing the existing
> >> system call.
> >
> >I think you are suggesting to use sigqueue() to deliver the signal and
> >perform the reaping when a special value accompanies it. This would be
> >somewhat similar to my early suggestion to use a flag in
> >pidfd_send_signal() (see:
> >https://lore.kernel.org/patchwork/patch/1060407) to implement memory
> >reaping which has another advantage of operation on PIDFDs instead of
> >PIDs which can be recycled.
> >kill()/pidfd_send_signal()/sigqueue() are supposed to deliver the
> >signal and return without blocking. Changing that behavior was
> >considered unacceptable in these discussions.
>
> The way I understood the request is that a userspace program (or perhaps two,
> if so desired) should issue _two_ calls, one to deliver the signal,
> one to perform the reap portion:
>
> uinfo.si_code = SI_QUEUE;
> sigqueue(pid, SIGKILL, &uinfo);
> uinfo.si_code = SI_REAP;
> sigqueue(pid, SIGKILL, &uinfo);
This approach would still lead to the same discussion: by design,
sigqueue/kill/pidfd_send_signal deliver the signal but do not wait for
the signal to be processed by the recipient. Changing that would be a
behavior change. Therefore we would have to follow this pattern and
implement memory reaping in an asynchronous manner using a
kthread/workqueue and it won't be done in the context of the calling
process. This is undesirable because we lose the ability to control
priority and cpu affinity for this operation and work won't be charged
to the caller.
That's why the proposed syscall performs memory reaping in the
caller's context and blocks until the operation is done. In this
proposal, your sequence looks like this:
pidfd_send_signal(pidfd, SIGKILL, NULL, 0);
process_reap(pidfd, 0);
except we decided to rename process_reap() to process_mrelease() in
the next revision.