Re: [PATCH v10 2/5] mseal: add mseal syscall
From: Theo de Raadt
Date: Tue Apr 16 2024 - 12:49:29 EST
Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> No per-vma change is checked prior to entering a per-vma modification
> loop today. This means that mseal() differs in behaviour in "up-front
> failure" vs "partial change failure" that exists in every other
> function.
I discussed this with Liam and Jeff a while ago (seperate conversations).
A bunch of linux m*() syscalls have weaker atomicity gaurantees than
the other systems I looked into.
Linux is an outlier here. Other systems do two passes over the "entries
in the range", before commiting to success or failure. When success is
returned, it means the whole range has been changed. When an error is
identified in the first pass, then no changes are applied, and error is
returned. I found no partial results in my limited reading of various
VM systems.
Actually the gaurantee of having done nothing upon error, is very common
system call behaviour. POSIX and defacto standards don't seem to
specify by specific wording as far as I can see, but majority of systems
seem to do so because it matches expectations.
Considering all the system calls, I can't think of any examples. There
are a few specific ioctl which were designed wrong.
I suspect, for performance reasons, there will be little appetite to
repair the m*() syscalls in Linux. (I would appreciate if they were
brought up to standard, so I guess that starts the 20 year counter :)
> I think we can all agree that having some up-front and some later
> without any reason will lead to a higher probability of things getting
> missed.
Also as attack surface. I spent some time thinking about circumstances
where this might help an attack.
The risk is that mprotect() return value is very rarely checked, yet
parts of objects will change. mprotect() is probably the least checked
system call, since people assume it will always succeed entirely; not
the case on Linux. Even more so not the case once immutable memory
ranges come into play, it's an even more likely error condition now.
I didn't find a particular piece of software (or an old attack) which
would help an attack with the sloppy permission handling aspects, but I only
thought about it for a couple days... there are people with more time
on their hands.