Re: [PATCH 2/4] mm: introduce external memory hinting API
From: Minchan Kim
Date: Tue Jan 14 2020 - 14:12:46 EST
On Tue, Jan 14, 2020 at 11:39:28AM +0300, Kirill Tkhai wrote:
> On 13.01.2020 22:18, Daniel Colascione wrote:
> > On Mon, Jan 13, 2020, 12:47 AM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote:
> >>> +SYSCALL_DEFINE5(process_madvise, int, pidfd, unsigned long, start,
> >>> + size_t, len_in, int, behavior, unsigned long, flags)
> >>
> >> I don't like the interface. The fact we have pidfd does not mean,
> >> we have to use it for new syscalls always. A user may want to set
> >> madvise for specific pid from console and pass pid as argument.
> >> pidfd would be an overkill in this case.
> >> We usually call "kill -9 pid" from console. Why shouldn't process_madvise()
> >> allow this?
> >
> > All new APIs should use pidfds: they're better than numeric PIDs
>
> Yes
>
> > in every way.
>
> No
>
> > If a program wants to allow users to specify processes by
> > numeric PID, it can parse that numeric PID, open the corresponding
> > pidfd, and then use that pidfd with whatever system call it wants.
> > It's not necessary to support numeric PIDs at the system call level to
> > allow a console program to identify a process by numeric PID.
>
> No. It is overkill. Ordinary pid interfaces also should be available.
> There are a lot of cases, when they are more comfortable. Say, a calling
> of process_madvise() from tracer, when a tracee is stopped. In this moment
> the tracer knows everything about tracee state, and pidfd brackets
> pidfd_open() and close() around actual action look just stupid, and this
> is cpu time wasting.
>
> Another example is a parent task, which manages parameters of its children.
> It knows everything about them, whether they are alive or not. Pidfd interface
> will just utilize additional cpu time here.
>
> So, no. Both interfaces should be available.
Sounds like that you want to support both options for every upcoming API
which deals with pid. I'm not sure how it's critical for process_madvise
API this case. In general, we sacrifice some performance for the nicer one
and later, once it's reported as hurdle for some workload, we could fix it
via introducing new flag. What I don't like at this moment is to make
syscall complicated with potential scenarios without real workload.