Re: [tip:timers/core] [posix] 1535cb8028: stress-ng.epoll.ops_per_sec 36.2% regression

From: Eric Dumazet
Date: Thu Mar 27 2025 - 04:26:28 EST


On Thu, Mar 27, 2025 at 9:10 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Thu, Mar 27 2025 at 07:21, Eric Dumazet wrote:
> > On Wed, Mar 26, 2025 at 10:11 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> >> On Wed, Mar 26, 2025 at 09:07:51AM +0100, Thomas Gleixner wrote:
> >> > Unfortunately I can't reproduce any of it. I checked the epoll test
> >> > source and it uses a posix timer, but that commit makes the hash less
> >> > contended so there is zero explanation.
> >> >
> >>
> >> The short summary is:
> >> 1. your change is fine
> >
> > Let me rephrase this.
> >
> > Absolutely wonderful series, thanks a lot Thomas for doing it.
>
> Thank you!
>
> > Next bottlenecks are now these ones, but showing up in synthetic
> > benchmarks only.
>
> Right. I saw them too when working on this.
>
> > 33.36% timer_storm [kernel.kallsyms] [k]
> > inc_rlimit_get_ucounts
> >
> > 32.85% timer_storm [kernel.kallsyms] [k]
> > dec_rlimit_put_ucounts
>
> These two are not really posix-timer specific. They are also the
> standouts for any signal micro benchmark.
>
> I stared at the implementation a bit, but there is not much we can do
> about that I fear.

We could place all these atomic fields in separate cache lines,
to keep read-only fields shared as much as possible.

diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 7183e5aca282..6ddf667022d9 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -118,7 +118,10 @@ struct ucounts {
struct hlist_node node;
struct user_namespace *ns;
kuid_t uid;
- atomic_t count;
+ atomic_t count ____cacheline_aligned_in_smp;
+ /* Note : should probably put all the following atomic_long_t
+ * in separate cache lines (one atomic_long_t per cache line).
+ */
atomic_long_t ucount[UCOUNT_COUNTS];
atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS];
};