Re: BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

From: Topi Miettinen
Date: Sat Oct 24 2020 - 07:34:45 EST


On 23.10.2020 20.52, Salvatore Mesoraca wrote:
Hi,

On Thu, 22 Oct 2020 at 23:24, Topi Miettinen <toiwoton@xxxxxxxxx> wrote:
SARA looks interesting. What is missing is a prctl() to enable all W^X
protections irrevocably for the current process, then systemd could
enable it for services with MemoryDenyWriteExecute=yes.

SARA actually has a procattr[0] interface to do just that.
There is also a library[1] to help using it.

That means that /proc has to be available and writable at that point, so setting up procattrs has to be done before mount namespaces are set up. In general, it would be nice for sandboxing facilities in kernel if there would be a way to start enforcing restrictions only at next execve(), like setexeccon() for SELinux and aa_change_onexec() for AppArmor. Otherwise the exact order of setting up various sandboxing options can be very tricky to arrange correctly, since each option may have a subtle effect to the sandboxing features enabled later. In case of SARA, the operations done between shuffling the mount namespace and before execve() shouldn't be affected so it isn't important. Even if it did (a new sandboxing feature in the future would need trampolines or JIT code generation), maybe the procattr file could be opened early but it could be written closer to execve().

-Topi