Re: [RFC]: userspace memory reaping

From: Minchan Kim
Date: Thu Nov 05 2020 - 12:07:56 EST


On Thu, Nov 05, 2020 at 08:50:58AM -0800, Suren Baghdasaryan wrote:
> On Thu, Nov 5, 2020 at 4:20 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Wed 04-11-20 12:40:51, Minchan Kim wrote:
> > > On Wed, Nov 04, 2020 at 07:58:44AM +0100, Michal Hocko wrote:
> > > > On Tue 03-11-20 13:32:28, Minchan Kim wrote:
> > > > > On Tue, Nov 03, 2020 at 10:35:50AM +0100, Michal Hocko wrote:
> > > > > > On Mon 02-11-20 12:29:24, Suren Baghdasaryan wrote:
> > > > > > [...]
> > > > > > > To follow up on this. Should I post an RFC implementing SIGKILL_SYNC
> > > > > > > which in addition to sending a kill signal would also reap the
> > > > > > > victim's mm in the context of the caller? Maybe having some code will
> > > > > > > get the discussion moving forward?
> > > > > >
> > > > > > Yeah, having a code, even preliminary, might help here. This definitely
> > > > > > needs a good to go from process management people as that proper is land
> > > > > > full of surprises...
> > > > >
> > > > > Just to remind a idea I suggested to reuse existing concept
> > > > >
> > > > > fd = pidfd_open(victim process)
> > > > > fdatasync(fd);
> > > > > close(fd);
> > > >
> > > > I must have missed this proposal. Anyway, are you suggesting fdatasync
> > > > to act as a destructive operation?
> > >
> > > write(fd) && fdatasync(fd) are already destructive operation if the file
> > > is shared.
> >
> > I am likely missing something because fdatasync will not destroy any
> > underlying data. It will sync
> >
> > > You don't need to reaping as destruptive operation. Rather than, just
> > > commit on the asynchrnous status "write file into page cache and commit
> > > with fsync" and "killing process and commit with fsync".
> >
> > I am sorry but I do not follow. The result of the memory reaping is a
> > data loss. Any private mapping will simply lose it's content. The caller
> > will get EFAULT when trying to access it but there is no way to
> > reconstruct the data. This is everything but not resembling what I see
> > f{data}sync is used for.
>
> I think Minchan considers f{data}sync as a "commit" operation. So
> write+f{data}sync would mean we write and commit written data,
> kill+f{data}sync would mean we kill and commit that kill (reclaim the
> resources).

If people doesn't like f{data}sync, ftruncate? My point is let's reuse
exising API since we have pidfd.

What I don't like about SIGKILL_SYNC is that it might introduce several
SIGXXX_SYNC later.