Re: [PATCH 00/10] OOM Debug print selection and additional information

From: Edward Chron
Date: Wed Aug 28 2019 - 23:32:01 EST


On Wed, Aug 28, 2019 at 1:04 PM Edward Chron <echron@xxxxxxxxxx> wrote:
>
> On Wed, Aug 28, 2019 at 3:12 AM Tetsuo Handa
> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On 2019/08/28 16:08, Michal Hocko wrote:
> > > On Tue 27-08-19 19:47:22, Edward Chron wrote:
> > >> For production systems installing and updating EBPF scripts may someday
> > >> be very common, but I wonder how data center managers feel about it now?
> > >> Developers are very excited about it and it is a very powerful tool but can I
> > >> get permission to add or replace an existing EBPF on production systems?
> > >
> > > I am not sure I understand. There must be somebody trusted to take care
> > > of systems, right?
> > >
> >
> > Speak of my cases, those who take care of their systems are not developers.
> > And they afraid changing code that runs in kernel mode. They unlikely give
> > permission to install SystemTap/eBPF scripts. As a result, in many cases,
> > the root cause cannot be identified.
>
> +1. Exactly. The only thing we could think of Tetsuo is if Linux OOM Reporting
> uses a an eBPF script then systems have to load them to get any kind of
> meaningful report. Frankly, if using eBPF is the route to go than essentially
> the whole OOM reporting should go there. We can adjust as we need and
> have precedent for wanting to load the script. That's the best we could come
> up with.
>
> >
> > Moreover, we are talking about OOM situations, where we can't expect userspace
> > processes to work properly. We need to dump information we want, without
> > counting on userspace processes, before sending SIGKILL.
>
> +1. We've tried and as you point out and for best results the kernel
> has to provide
> the state.
>
> Again a full system dump would be wonderful, but taking a full dump for
> every OOM event on production systems? I am not nearly a good enough salesman
> to sell that one. So we need an alternate mechanism.
>
> If we can't agree on some sort of extensible, configurable approach then put
> the standard OOM Report in eBPF and make it mandatory to load it so we can
> justify having to do that. Linux should load it automatically.
> We'll just make a few changes and additions as needed.
>
> Sounds like a plan that we could live with.
> Would be interested if this works for others as well.

One further comment. In talking with my colleagues here who know eBPF
much better
than I do, it may not be possible to implement something this
complicated with eBPF.

If that is in the fact the case, then we'd have to try and hook the
OOM Reporting code
with tracepoints similar to kprobes only we want to do more than add counters
we want to change the flow to skip small output entries that aren't
worth printing.
If this isn't feasible with eBPF, then some derivative or our approach
or enhancing
the OOM output code directly seem like the best options. Will have to
investigate
this further.