Re: [RFC PATCH for 4.18 1/2] rseq: validate rseq_cs fields are < TASK_SIZE

From: Andy Lutomirski
Date: Thu Jun 28 2018 - 19:30:26 EST


On Thu, Jun 28, 2018 at 2:22 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Jun 28, 2018 at 1:23 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>
>> This is okay with me for a fix outside the merge window. Can you do a
>> followup for the next merge window that fixes it better, though? In
>> particular, TASK_SIZE is generally garbage. I think a better fix
>> would be something like adding a new arch-overridable helper like:
>>
>> static inline unsigned long current_max_user_addr(void) { return TASK_SIZE; }
>
> We already have that. It's called "user_addr_max()".

Nah, that one is more or less equivalent to TASK_SIZE_MAX, except that
it's different if set_fs() is used.

>
> It's the limit we use for user accesses.
>
> That said, I don't see why we should even check the IP. It's not like
> that's done by signal handling either.

The idea is that, if someone screws up and sticks a number like
0xbaadf00d00045678 into their rseq abort_ip in a 32-bit x86 program
(when they actually mean 0x00045678), we want to something consistent.
On a 32-bit kernel, presumably it gets cast to u32 somewhere and it
works. On a 64-bit kernel, we end up shoving 0xbaadf00d00045678 into
regs->ip, and then the entry code will do, um, something. If I had to
guess, I would guess that at least IRET is likely to truncate if we're
returning to a 32-bit CS. But I really don't want to start promising
that we won't segfault if a different path gets invoked on some future
kernel on some future CPU of if we're on an AMD CPU using their
utterly braindead SYSRETL microcode, etc.

So I think we're much better off if we either promise that rseq
truncates the address for 32-bit users or that it segfaults if high
bits are set for 32-bit users.

TASK_SIZE is a super shitty way to do this. The correct thing is to
either add some check to the exit-to-usermode slowpath that rseq can
trigger or if we add some reasonable way for rseq to say "is this
address a legitimate addressable virtual address for the current
task's user space operating mode." We don't have such a thing right
now.