Re: [PATCH 2/2] arm64: Clear the stack

From: Alexander Popov
Date: Mon May 14 2018 - 05:35:42 EST


On 14.05.2018 08:15, Mark Rutland wrote:
> On Sun, May 13, 2018 at 11:40:07AM +0300, Alexander Popov wrote:
>> It seems that previously I was very "lucky" to accidentally have those MIN_STACK_LEFT,
>> call trace depth and oops=panic together to experience a hang on stack overflow
>> during BUG().
>>
>>
>> When I run my test in a loop _without_ VMAP_STACK, I manage to corrupt the neighbour
>> processes with BUG() handling overstepping the stack boundary. It's a pity, but
>> I have an idea.
>
> I think that in the absence of VMAP_STACK, there will always be cases where we
> *could* corrupt a neighbouring stack, but I agree that trying to minimize that
> possibility would be good.

Ok!

>> In kernel/sched/core.c we already have:
>>
>> #ifdef CONFIG_SCHED_STACK_END_CHECK
>> if (task_stack_end_corrupted(prev))
>> panic("corrupted stack end detected inside scheduler\n");
>> #endif
>>
>> So what would you think if I do the following in check_alloca():
>>
>> if (size >= stack_left) {
>> #if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
>> panic("alloca over the kernel stack boundary\n");
>> #else
>> BUG();
>> #endif
>
> Given this is already out-of-line, how about we always use panic(), regardless
> of VMAP_STACK and SCHED_STACK_END_CHECK? i.e. just
>
> if (unlikely(size >= stack_left))
> panic("alloca over the kernel stack boundary");
>
> If we have VMAP_STACK selected, and overflow during the panic, it's the same as
> if we overflowed during the BUG(). It's likely that panic() will use less stack
> space than BUG(), and the compiler can put the call in a slow path that
> shouldn't affect most calls, so in all cases it's likely preferable.

I'm sure that maintainers and Linus will strongly dislike my patch if I always
use panic() here. panic() kills the whole kernel and we shouldn't use it when we
can safely continue to work.

Let me describe my logic. So let's have size >= stack_left on a thread stack.

1. If CONFIG_VMAP_STACK is enabled, we can safely use BUG(). Even if BUG()
handling overflows the thread stack into the guard page, handle_stack_overflow()
is called and the neighbour memory is not corrupted. The kernel can proceed to live.

2. If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt the neighbour
kernel memory and cause the undefined behaviour of the whole kernel. I see it on
my lkdtm test. That is a cogent reason for panic().

2.a. If CONFIG_SCHED_STACK_END_CHECK is enabled, the kernel already does panic()
when STACK_END_MAGIC is corrupted. So we will _not_ break the safety policy if
we do panic() in a similar situation in check_alloca().

2.b. If CONFIG_SCHED_STACK_END_CHECK is disabled, the user has some real reasons
not to do panic() when the kernel stack is corrupted. So we should not do it in
check_alloca() as well, just use BUG() and hope for the best.

That logic can be expressed this way:

if (size >= stack_left) {
#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
panic("alloca over the kernel stack boundary\n");
#else
BUG();
#endif

I think I should add a proper comment to describe it.

Thank you.

Best regards,
Alexander