Re: [PATCH v2] RISC-V: Probe misaligned access speed in parallel
From: Evan Green
Date: Thu Sep 21 2023 - 16:21:16 EST
On Thu, Sep 21, 2023 at 3:22 AM David Laight <David.Laight@xxxxxxxxxx> wrote:
>
> ...
> > > For probing alignment speed, you just care about running it on that
> > > cpu. Correct ?
> >
> > For this we care both about not migrating to other CPUs, and also
> > secondarily minimizing disturbances while the test is being run.
> > Usually I equate pre-emption with migration, but in this case I think
> > the worker threads are bound to that CPU. So I'll keep the
> > preempt_disable/enable where it is, since it's harmless for CPUs other
> > than 0, but useful for 0. I also like it for readability as it
> > highlights the critical section (as a reader, "is preemption disabled"
> > would be one of my first questions when studying this).
>
> You need to disable pre-emption to get any kind of meaningful answer.
>
> But why do you need to run the test on more than the boot cpu?
> If you've a heterogenous mix of cpu any code that looks at the answer
> is going to behave incorrectly unless it has also disabled pre-emption
> or is bound to a cpu.
I don't think it's safe to assume misaligned access speed is the same
across all cores. In a big.little combination I can easily imagine the
big cores having fast misaligned access and the slow cores not having
it (though hopefully the slow cores don't kick it to firmware). Since
this info is presented to usermode per-cpu, I'd like it to be correct.
>
> One obvious use of the result is to setup some static branches.
> But that assumes all cpu are the same.
Right, this could be used to set up static branches, or in an ifunc
selector. This is why we provide pre-computed answers for "all CPUs"
in hwprobe. If the situation I describe above did happen, code asking
on behalf of all CPUs would come back "unknown", which is true (though
would probably default to the byte-copy version). More specific
environments might choose to curate their scheduling a bit more, and
this info per-core would be useful there.
-Evan