Re: [PATCH] x86/shstk: Free the thread shadow stack before disassociating from the mm

From: Edgecombe, Rick P
Date: Wed Sep 11 2024 - 14:01:19 EST


On Tue, 2024-09-10 at 23:56 +0100, Mark Brown wrote:
> When using shadow stacks the kernel will transparently allocate a shadow
> stack for each thread. The intention is that this will be freed when the
> thread exits but currently this doesn't actually happen. The deallocation
> is done by shstk_free() which is called from exit_thread() and has a guard
> to check for !tsk->mm due to the use of vm_unmap(). This doesn't actually
> do anything since in do_exit() we call exit_mm() prior to thread_exit() and
> exit_mm() disassociates the task from the mm and clears tsk->mm. The result
> is that no shadow stacks will be freed until the process exits, leaking
> memory for any process which creates and destroys threads.
>
> Fix this by moving the shstk_free() to a new exit_thread_early() call which
> runs immediately prior to exit_mm(). We don't do this right at the start of
> do_exit() due to the handling for klling init. This could trigger some
> incompatibility if there is code that looks at the shadow stack of a thread
> which has exited but this seems much less likely than the leaking of shadow
> stacks due to thread deletion.
>
> Fixes: 18e66b695e78 ("x86/shstk: Add Kconfig option for shadow stack")
> Signed-off-by: Mark Brown <broonie@xxxxxxxxxx>
> ---
> It is entirely possible I am missing something here, I don't have a
> system that allows me to test shadow stack support directly and have
> only checked this by inspection and tested with my arm64 GCS series.
> If this makes sense it'll need to become a dependency for GCS.

The common cleanup case is via deactivate_mm()->shstk_free(), which happens when
the MM is still attached. But there is also an exit_thread() caller in the fork
failure patch (see copy_process()).

So by my inspection, the exit_thread_early() is not needed because of the
deactivate_mm() path that happens earlier in do_exit() via exit_mm(). But since
this patch also removes the shstk_free() from the copy_process() error path, I
think we would need clarity that it is unneeded there.

A quick search through the arm series and I don't see deactivate_mm()
implementation, and instead a separate cleanup solution. Could that be the
reason why you saw the leak on arm? Considering the trickiness of the auto
allocated shadow stacks lifecycle, I think it would be great if all the
implementations had common logic. If possible at least.

BTW, two more notes on this whole area:
1. 99% sure glibc has some tests that catch leaks like hypothesized here, by
watching for memory grown after repeated thread exits. IIRC I introduced a
shadow stack leak at some point during development that failed the test.
2. Weijiang (CCed) is working on a fix for case in the opposite direction. An
error path that attempts to free the shadow stack twice and triggers a warning.