Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation

From: Steven Rostedt
Date: Wed Nov 18 2020 - 07:46:15 EST


On Tue, 17 Nov 2020 20:54:24 -0800
Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:

> > extern int
> > @@ -310,7 +312,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> > do { \
> > it_func = (it_func_ptr)->func; \
> > __data = (it_func_ptr)->data; \
> > - ((void(*)(void *, proto))(it_func))(__data, args); \
> > + /* \
> > + * Removed functions that couldn't be allocated \
> > + * are replaced with TRACEPOINT_STUB. \
> > + */ \
> > + if (likely(it_func != TRACEPOINT_STUB)) \
> > + ((void(*)(void *, proto))(it_func))(__data, args); \
>
> I think you're overreacting to the problem.

I will disagree with that.

> Adding run-time check to extremely unlikely problem seems wasteful.

Show me a real benchmark that you can notice a problem here. I bet that
check is even within the noise of calling an indirect function. Especially
on a machine with retpolines.

> 99.9% of the time allocate_probes() will do kmalloc from slab of small
> objects.
> If that slab is out of memory it means it cannot allocate a single page.
> In such case so many things will be failing to alloc that system
> is unlikely operational. oom should have triggered long ago.
> Imo Matt's approach to add __GFP_NOFAIL to allocate_probes()

Looking at the GFP_NOFAIL comment:

* %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
* cannot handle allocation failures. The allocation could block
* indefinitely but will never return with failure. Testing for
* failure is pointless.
* New users should be evaluated carefully (and the flag should be
* used only when there is no reasonable failure policy) but it is
* definitely preferable to use the flag rather than opencode endless
* loop around allocator.

I realized I made a mistake in my patch for using it, as my patch is a
failure policy. It looks like something we want to avoid in general.

Thanks, I'll send a v3 that removes it.

> when it's called from func_remove() is much better.
> The error was reported by syzbot that was using
> memory fault injections. ENOMEM in allocate_probes() was
> never seen in real life and highly unlikely will ever be seen.

And the biggest thing you are missing here, is that if you are running on a
machine that has static calls, this code is never hit unless you have more
than one callback on a single tracepoint. That's because when there's only
one callback, it gets called directly, and this loop is not involved.

-- Steve