Re: [PATCH] arm64: Set SSBS for user threads while creation

From: Will Deacon
Date: Wed Jan 29 2020 - 11:13:48 EST


On Wed, Jan 29, 2020 at 05:18:53PM +0530, Srinivas Ramana wrote:
> On 1/2/2020 11:31 PM, Catalin Marinas wrote:
> > On Mon, Dec 23, 2019 at 06:32:26PM +0530, Srinivas Ramana wrote:
> > > Current SSBS implementation takes care of setting the
> > > SSBS bit in start_thread() for user threads. While this works
> > > for tasks launched with fork/clone followed by execve, for cases
> > > where userspace would just call fork (eg, Java applications) this
> > > leaves the SSBS bit unset. This results in performance
> > > regression for such tasks.
> > >
> > > It is understood that commit cbdf8a189a66 ("arm64: Force SSBS
> > > on context switch") masks this issue, but that was done for a
> > > different reason where heterogeneous CPUs(both SSBS supported
> > > and unsupported) are present. It is appropriate to take care
> > > of the SSBS bit for all threads while creation itself.
> > >
> > > Fixes: 8f04e8e6e29c ("arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3")
> > > Signed-off-by: Srinivas Ramana <sramana@xxxxxxxxxxxxxx>
> >
> > I suppose the parent process cleared SSBS explicitly. Isn't the child
>
> Actually we observe that parent(in case of android, zygote that launches the
> app) does have SSBS bit set. However child doesn't have the bit set.

On which SoC? Your commit message talks about heterogeneous systems (wrt
SSBS) as though they don't apply in your case. Could you provide us with
a reproducer?

> > after fork() supposed to be nearly identical to the parent? If we did as
> > you suggest, someone else might complain that SSBS has been set in the
> > child after fork().
>
> I am also wondering why would a userspace process clear SSBS bit loosing the
> performance benefit.

I guess it could happen during sigreturn if the signal handler wasn't
careful about preserving bits in pstate, although it doesn't feel like
something you'd regularly run into.

But hang on a sec -- it looks like the context switch logic in
cbdf8a189a66 actually does the wrong thing for systems where all of the
CPUs implement SSBS. I don't think it explains the behaviour you're seeing,
but I do think it could end up in situations where SSBS is unexpectedly
*set*.

Diff below.

Will

--->8

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index bbb0f0c145f6..e38284c9fb7b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -466,6 +466,13 @@ static void ssbs_thread_switch(struct task_struct *next)
if (unlikely(next->flags & PF_KTHREAD))
return;

+ /*
+ * If all CPUs implement the SSBS instructions, then we just
+ * need to context-switch the PSTATE field.
+ */
+ if (cpu_have_feature(cpu_feature(SSBS)))
+ return;
+
/* If the mitigation is enabled, then we leave SSBS clear. */
if ((arm64_get_ssbd_state() == ARM64_SSBD_FORCE_ENABLE) ||
test_tsk_thread_flag(next, TIF_SSBD))