Re: [PATCH v1] ringbuffer: Don't choose the process with adj equal OOM_SCORE_ADJ_MIN

From: Joel Fernandes
Date: Tue Apr 10 2018 - 14:39:38 EST


Hi Steve,

On Tue, Apr 10, 2018 at 11:00 AM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Tue, 10 Apr 2018 09:45:54 -0700
> Joel Fernandes <joelaf@xxxxxxxxxx> wrote:
>
>> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
>> > index a0233edc0718..807e2bcb21b3 100644
>> > --- a/include/linux/ring_buffer.h
>> > +++ b/include/linux/ring_buffer.h
>> > @@ -106,7 +106,8 @@ __poll_t ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu,
>> >
>> > void ring_buffer_free(struct ring_buffer *buffer);
>> >
>> > -int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size, int cpu);
>> > +int ring_buffer_resize(struct ring_buffer *buffer, unsigned long size,
>> > + int cpu, int rbflags);
>> >
>> > void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);
>> >
>> > @@ -201,6 +202,7 @@ int ring_buffer_print_page_header(struct trace_seq *s);
>> >
>> > enum ring_buffer_flags {
>> > RB_FL_OVERWRITE = 1 << 0,
>> > + RB_FL_NO_RECLAIM = 1 << 1,
>>
>> But the thing is, set_oom_origin doesn't seem to be doing the
>> desirable thing every time anyway as per my tests last week [1] and
>> the si_mem_available check alone seems to be working fine for me (and
>> also Zhaoyang as he mentioned).
>
> But did you try it with just plain GFP_KERNEL, and not RETRY_MAYFAIL.

Yes I tried it with just GFP_KERNEL as well. What I did based on your
suggestion for testing the OOM hint is:
1. Comment the si_mem_available check
2. Do only GFP_KERNEL

The system gets destabilized with this combination even with the OOM
hint. These threads are here:
https://lkml.org/lkml/2018/4/5/720

> My tests would always trigger the allocating task without the
> RETRY_MAYFAIL, but with RETRY_MAYFAIL it would sometimes take out other
> tasks.
>
>>
>> Since the problem Zhaoyang is now referring to is caused because of
>> calling set_oom_origin in the first place, can we not just drop that
>> patch and avoid adding more complexity?
>
> Actually, I'm thinking of dropping the MAYFAIL part. It really should
> be the one targeted if you are extending the ring buffer.

This then sounds like it should be fixed in -mm code? If we're giving
the hint and its not getting killed there then that's an -mm issue.

> I could add two loops. One that does NORETRY without the oom origin,
> and if it succeeds, its fine. But if it requires reclaim, it will then
> set oom_origin and go harder (where it should be the one targeted).
>
> But that may be pointless, because if NORETRY succeeds, there's not
> really any likelihood of oom triggering in the first place.

Yes.

>
>>
>> IMHO I feel like for things like RB memory allocation, we shouldn't
>> add a knob if we don't need to.
>
> It was just a suggestion.

Cool, I understand.

>>
>> Also I think Zhaoyang is developing for Android too since he mentioned
>> he ran CTS tests so we both have the same "usecase" but he can feel
>> free to correct me if that's not the case ;)
>
> I think if you are really worried with the task being killed by oom,
> then I agree with Michal and just fork a process to do the allocation
> for you.

Yes I agree. So lets just do that and no other patches additional
patches are needed then. Let me know if there's anything else I
missed?

Also I got a bit confused, I reread all the threads. Zhaoyang's
current issue is that the OOM hint *IS* working which is what
triggered your patch to toggle the behavior through an option. Where
was in this message we are discussing that the OOM hint doesn't always
work which is not Zhaoyang's current issue. Let me know if I missed
something? Sorry if I did.

thanks,

- Joel