Re: [RFC v2] prctl: prctl(PR_SET_IDLE, PR_IDLE_MODE_KILLME), for stateless idle loops

From: Shawn Landden
Date: Mon Nov 20 2017 - 23:48:20 EST


On Mon, Nov 20, 2017 at 12:35 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> On Fri 17-11-17 20:45:03, Shawn Landden wrote:
>> On Fri, Nov 3, 2017 at 2:09 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>>
>> > On Thu 02-11-17 23:35:44, Shawn Landden wrote:
>> > > It is common for services to be stateless around their main event loop.
>> > > If a process sets PR_SET_IDLE to PR_IDLE_MODE_KILLME then it
>> > > signals to the kernel that epoll_wait() and friends may not complete,
>> > > and the kernel may send SIGKILL if resources get tight.
>> > >
>> > > See my systemd patch: https://github.com/shawnl/systemd/tree/prctl
>> > >
>> > > Android uses this memory model for all programs, and having it in the
>> > > kernel will enable integration with the page cache (not in this
>> > > series).
>> > >
>> > > 16 bytes per process is kinda spendy, but I want to keep
>> > > lru behavior, which mem_score_adj does not allow. When a supervisor,
>> > > like Android's user input is keeping track this can be done in
>> > user-space.
>> > > It could be pulled out of task_struct if an cross-indexing additional
>> > > red-black tree is added to support pid-based lookup.
>> >
>> > This is still an abuse and the patch is wrong. We really do have an API
>> > to use I fail to see why you do not use it.
>> >
>> When I looked at wait_queue_head_t it was 20 byes.
>
> I do not understand. What I meant to say is that we do have a proper
> user api to hint OOM killer decisions.
This is a FIFO queue, rather than a heuristic, which is all you get
with the current API.
> --
> Michal Hocko
> SUSE Labs