Re: [PATCH v1] neighbour: Don't let neigh_forced_gc() disable preemption for long

From: Doug Anderson
Date: Fri Dec 01 2023 - 13:41:11 EST

Next message: Minchan Kim: "Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store"
Previous message: Jann Horn: "mutex/spinlock semantics [was: Re: io_uring: incorrect assumption about mutex behavior on unlock?]"
In reply to: Eric Dumazet: "Re: [PATCH v1] neighbour: Don't let neigh_forced_gc() disable preemption for long"
Next in thread: Julian Anastasov: "Re: [PATCH v1] neighbour: Don't let neigh_forced_gc() disable preemption for long"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

On Fri, Dec 1, 2023 at 9:35 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> > > > > @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > > > > {
> > > > > int max_clean = atomic_read(&tbl->gc_entries) -
> > > > > READ_ONCE(tbl->gc_thresh2);
> > > > > + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
> > > >
> > > > It might be nice to make the above timeout based on jiffies. On a
> > > > HZ=100 system it's probably OK to keep preemption disabled for 10 ms
> > > > but on a HZ=1000 system you'd want 1 ms. ...so maybe you'd want to use
> > > > jiffies_to_nsecs(1)?
> > >
> > > I do not think so. 10ms would be awfully long.
> > >
> > > We have nsec based time service, why downgrading to jiffies resolution ???
> >
> > Well, the whole issue is that we're disabling preemption, right?
> > Unless I'm mistaken, on a 1000 HZ system then a task's timeslice is
> > 1ms and on a 100 HZ system then a task's timeslice is 10ms. When we
> > disable preemption then the problem is that we can keep going past the
> > end of our timeslice. This is bad for whatever task the system is
> > trying to schedule instead of us since it will be blocked waiting for
> > us to re-enable preemption.
> >
> > So essentially the problem here is really tied to jiffies resolution,
> > right? Specifically, if jiffies is 100 Hz then it's actually
> > inefficient to timeout every 1 ms--I think it would be better to use
> > up our whole timeslice.
>
> It is not because a kernel is built with HZ=100 that each thread has
> to consume cpu times in 10ms slices.
>
> Process scheduler does not use jiffies at all, but high resolution time service.
>
> Keep in mind this code is run from soft-interrupt, not a dedicated processus.

Fair enough. I guess my mental model is wrong here. Please disregard
my suggestion about using something based on how long "jiffies" is
then. Using a fixed 1 ms timeout sounds great, then.

> > > Can you tell us in which scenario this gc_list can be so big, other
> > > than fuzzers ?
> >
> > The place we hit this wasn't actually with fuzzers but with normal
> > usage in our labs. The only case where it was a really big problem was
> > when neigh_forced_gc() was scheduled on a "little" CPU (in a
> > big.LITTLE system) and that little CPU happened to be running at the
> > lowest CPU frequency. Specifically Judy was testing on sc7180-trogdor
> > and the lowest CPU Frequency of the "little" CPUs was 300 MHz. Since
> > the littles are less powerful than the bigs, this is roughly the
> > equivalent processing power of a big core running at 120 MHz.
> >
> > FWIW, we are apparently no longer seeing the bad latency after
> > <https://crrev.com/c/4914309>, which does this:
> >
> > # Increase kernel neighbor table size.
> > echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1
> > echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh2
> > echo 8192 > /proc/sys/net/ipv4/neigh/default/gc_thresh3
> > echo 1024 > /proc/sys/net/ipv6/neigh/default/gc_thresh1
> > echo 4096 > /proc/sys/net/ipv6/neigh/default/gc_thresh2
> > echo 8192 > /proc/sys/net/ipv6/neigh/default/gc_thresh3
> >
> > However, I still believe that we should land something like Judy's
> > patch because, no matter what kernel tunings we have, the kernel
> > shouldn't be disabling preemption for so long.
>
> Sure, and I suggested a refinement, because as I said jiffies can
> stick to a value.
>
> Not sure why a refinement given by a network maintainer is not an option ?
>
> I must be missing something.

Your refinement was good. I was merely trying to answer the question
you asked about how we got into it as completely as possible. Sorry if
I caused confusion.

Next message: Minchan Kim: "Re: [PATCH] zram: Using GFP_ATOMIC instead of GFP_KERNEL to allocate bitmap memory in backing_dev_store"
Previous message: Jann Horn: "mutex/spinlock semantics [was: Re: io_uring: incorrect assumption about mutex behavior on unlock?]"
In reply to: Eric Dumazet: "Re: [PATCH v1] neighbour: Don't let neigh_forced_gc() disable preemption for long"
Next in thread: Julian Anastasov: "Re: [PATCH v1] neighbour: Don't let neigh_forced_gc() disable preemption for long"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]