Re: [PATCH 4/6] scs: Move scs_overflow_check() out of architecture code

From: Mark Rutland
Date: Mon May 18 2020 - 09:32:39 EST


On Mon, May 18, 2020 at 02:23:47PM +0100, Will Deacon wrote:
> On Mon, May 18, 2020 at 01:12:10PM +0100, Mark Rutland wrote:
> > On Fri, May 15, 2020 at 06:27:54PM +0100, Will Deacon wrote:
> > > There is nothing architecture-specific about scs_overflow_check() as
> > > it's just a trivial wrapper around scs_corrupted().
> > >
> > > For parity with task_stack_end_corrupted(), rename scs_corrupted() to
> > > task_scs_end_corrupted() and call it from schedule_debug() when
> > > CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused
> > > scs_overflow_check() function entirely.
> > >
> > > This has absolutely no impact on architectures that do not support SCS
> > > (currently arm64 only).
> > >
> > > Signed-off-by: Will Deacon <will@xxxxxxxxxx>
> >
> > Pulling this out of arch code seems sane to me, and the arch-specific
> > chanes look sound. However, I have a concern with the changes within the
> > scheduler context-switch.
> >
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index a35d3318492c..56be4cbf771f 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > @@ -52,7 +52,6 @@
> > > #include <asm/mmu_context.h>
> > > #include <asm/processor.h>
> > > #include <asm/pointer_auth.h>
> > > -#include <asm/scs.h>
> > > #include <asm/stacktrace.h>
> > >
> > > #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
> > > @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
> > > entry_task_switch(next);
> > > uao_thread_switch(next);
> > > ssbs_thread_switch(next);
> > > - scs_overflow_check(next);
> >
> > Prior to this patch, we'd never switch to a task whose SCS had already
> > been corrupted.
> >
> > With this patch, we only check that when switching away from a task, and
> > only when CONFIG_SCHED_STACK_END_CHECK is selected, which at first
> > glance seems to weaken that.
>
> Yes, ignoring vmap'd stacks, this patch brings the SCS checking in-line with
> the main stack checking when CONFIG_SCHED_STACK_END_CHECK=y.
>
> > Arguably:
> >
> > * If the next task's SCS was corrupted by that task while it was
> > running, we had already lost at that point.
>
> With this change, we'll at least catch this one sooner, and that might be
> useful if a bug has caused us to overflow the SCS but not the main stack.

Sure, but only if CONFIG_SCHED_STACK_END_CHECK is selected.

> > * If the next task's SCS was corrupted by another task, then that could
> > also happen immediately after the check (though timing to avoid the
> > check but affect the process could be harder).
>
> We're only checking the magic end value, so the cross-task case is basically
> if you overrun your own SCS as above, but then continue to overrun entire
> SCSs for other tasks as well. It's probably not very useful in that case.
>
> > ... and a VMAP'd SCS would be much nicer in this regard.
> >
> > Do we think this is weakening the check, or do we think it wasn't all
> > that helpful to begin with?
>
> I see it as a debug check to catch SCS overflow, rather than a hardening
> feature, and I agree that using something like vmap stack for the SCS would
> be better because we could have a guard page instead.

Fair enough. Could we put something into the commit message that more
explicitly calls out debug-not-hardening? I agree that under that model
this patch looks fine, and with something to that effect:

Reviewed-by: Mark Rutland <mark.rutland@xxxxxxx>

Mark.

> This is something I would like to revisit, but we need more
> information from Sami about why Android rejected the larger allocation
> size, since I don't think there's an awful lot of point merging this
> series if Android doesn't pick it up.

Indeed. I'd certainly prefer the robustness of a VMAP'd SCS if we can do
that.

Mark.