Re: [PATCH 2/2] trace/kprobe: Remove limit on kretprobe maxactive

From: Naveen N. Rao
Date: Fri Jun 18 2021 - 09:19:51 EST

Masami Hiramatsu wrote:

To address this, as a first step, we should probably consider parsing kprobe_profile and printing a warning with 'perf' if we detect a non-zero miss count for a probe -- both a regular probe, as well as a retprobe.

Yeah, it is doable. Note that perf-probe only set up the event and
perf-trace or other commands will use it.

If we do this, the nice thing with kprobe_profile is that the probe miss count is available, and can serve as a good way to decide what a more reasonable maxactive value should be. This should help prevent users from trying with arbitrary maxactive values.

Such feedback loop is an interesting idea.
Note that nmissed count is an accumulate value, not the max number of
the instance which will be needed.

Yes, we will have to factor-in the duration during which the event was active. This will still be an approximation, but serves as a good starting point. It may need a few tries to get this right, but more
importantly, the user knows instantly that there are missed probes.

For perf_event_open(), perhaps we can introduce an ioctl to query the probe miss count.

Or, maybe we can expand the maxactive in runtime. e.g. add a shortage
counter on the kretprobe, and run a monitor kernel thread (or kworker).
If the shortage counter is incremented, the monitor allocates instances
(2x counter) and give it to the kretprobe. And it resets the shortage
counter. This adaptive maxactive may cause mis-hit in the beginning,
but finally find the optimal maxactive value automatically.

I like this idea and I have been thinking along these lines too. If we start with a better default (rather than just num_possible_cpus() used today), I suspect we may be able to get this to work well enough to not have to miss any probes. Specifying 'maxactive' can still serve as a workaround to allocate a larger initial set of kretprobe_instances in case this doesn't work.

> To avoid such trouble, I had set the 4096 limitation for the maxactive
> parameter. Of course 4096 may not enough for some use-cases. I'm > welcome
> to expand it (e.g. 32k, isn't it enough?), but removing the limitation
> may cause OOM trouble easily.

Do you have suggestions for how we can determine a better limit? As you point out in the other email, there could very well be 64k or more processes on a large machine. Since the primary concern is memory usage, we probably need to decide this based on total memory. But, memory usage will vary depending on system load...

This is very good question. IMHO, it might better to calculate the total
maxactive from the system memory size. For example, 1% of system memory
can be used for the kretprobes, 16GB system will allow using 160MB for
kretprobes, which means about "30M" is the max number of maxactive, or
multiple kretprobes can share it. Doesn't it sound enough? Of course
this will need to show the current usage of the kretprobe instance objects
via tracefs or debugfs. But this total cap seems reasonable for me to
avoid OOM trouble.

Perhaps we can start by making maxactive limit be a tunable with a default value of 4096, with the understanding that users will be careful when bumping up this value. Hopefully, scripts won't simply start writing into this file ;)

Yeah, that's what I suggested at first, because the best maxactive will
depend on the max number of the *processes* and the probed function.

If the probed function will NOT be preempted or slept, maxactive will be
the number of *processor cores*. Or, if it can be preempted or slept, it
will be the max number of *processes*. If the probed function can
recursively called (Note: this is rare case), the maxactive has to
be multiplied.

It is hard to estimate the max number of processes, since it depends
on the system. Small embedded systems don't run thousands of processes,
but big servers will run more than ten thousands of processes.
Thus make it tunable will be a good idea.