Re: [PATCH v1] neighbour: Don't let neigh_forced_gc() disable preemption for long

From: Doug Anderson
Date: Fri Dec 01 2023 - 10:17:45 EST


Hi,

On Fri, Dec 1, 2023 at 1:10 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <judyhsiao@xxxxxxxxxxxx> wrote:
> >
> > We are seeing cases where neigh_cleanup_and_release() is called by
> > neigh_forced_gc() many times in a row with preemption turned off.
> > When running on a low powered CPU at a low CPU frequency, this has
> > been measured to keep preemption off for ~10 ms. That's not great on a
> > system with HZ=1000 which expects tasks to be able to schedule in
> > with ~1ms latency.
>
> This will not work in general, because this code runs with BH blocked.
>
> jiffies will stay untouched for many more ms on systems with only one CPU.
>
> I would rather not rely on jiffies here but ktime_get_ns() [1]
>
> Also if we break the loop based on time, we might be unable to purge
> the last elements in gc_list.
> We might need to use a second list to make sure to cycle over all
> elements eventually.
>
>
> [1]
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
> 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> {
> int max_clean = atomic_read(&tbl->gc_entries) -
> READ_ONCE(tbl->gc_thresh2);
> + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;

It might be nice to make the above timeout based on jiffies. On a
HZ=100 system it's probably OK to keep preemption disabled for 10 ms
but on a HZ=1000 system you'd want 1 ms. ...so maybe you'd want to use
jiffies_to_nsecs(1)?

One worry might be that we disabled preemption _right before_ we were
supposed to be scheduled out. In that case we'll end up blocking some
other task for another full timeslice, but maybe there's not a lot we
can do there?

-Doug