Re: [PATCH v3 00/12] AMD broadcast TLB invalidation

From: Rik van Riel
Date: Sat Jan 11 2025 - 21:50:52 EST


On Mon, 2025-01-06 at 11:03 -0800, Dave Hansen wrote:
>
> So can we call them "global", "shared" or "system" ASIDs, please?
>
I have renamed them to global ASIDs.

> Second, the TLB_NR_DYN_ASIDS was picked because it's roughly the
> number
> of distinct PCIDs that the CPU can keep in the TLB at once (at least
> on
> Intel). Let's say a CPU has 6 mm's in the per-cpu ASID space and
> another
> 6 in the shared/broadcast space. At that point, PCIDs might not be
> doing
> much good because the TLB can't store entries for 12 PCIDs.
>
If the CPU has 12 runnable processes, we may have
various other performance issues, too, like the
system simply not having enough CPU power to run
all the runnable tasks.

Most of the systems I have looked at seem to average
between .2 and 2 runnable tasks per CPU, depending on
whether the workload is CPU bound, or memory/IO bound.

> Is there any comprehension in this series? Should we be indexing
> cpu_tlbstate.ctxs[] by a *context* number rather than by the ASID
> that
> it's running as?
>
We only need the cpu_tlbstate.ctxs[] for the per-CPU
ASID space, in order to look up what process is
assigned which slot.

We do not need it for global ASID numbers, which are
always the same everywhere.

> Last, I'm not 100% convinced we want to do this whole thing. The
> will-it-scale numbers are nice. But given the complexity of this, I
> think we need some actual, real end users to stand up and say exactly
> how this is important in *PRODUCTION* to them.
>
Do any of these count? :)

https://www.phoronix.com/review/amd-invlpgb-linux
I am hoping to gather some real world numbers as well,
and will work with some workload owners to get some numbers.


--
All Rights Reversed.