Re: Regression: commit da029c11e6b1 broke toybox xargs.

From: Kees Cook
Date: Fri Nov 03 2017 - 21:40:10 EST


On Fri, Nov 3, 2017 at 6:22 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Nov 3, 2017 at 5:42 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>>
>> If we didn't do the "but no more than 75% of _STK_LIM", and moved to
>> something like "check stack utilization after loading the binary", we
>> end up in the position where the kernel is past the point of no return
>> (so instead of E2BIG, the execve()ing process just SEGVs), which is
>> much harder to debug or recover from (i.e. there's no process left to
>> return from the execve() from).
>
> Yeah, we've had that problem in the past, and it's the worst of all worlds.
>
> You can still trigger it (set RLIMIT_DATA to something much too small,
> for example, and then generate more than that by just repeating the
> same argument multiple times so that the execve() user doesn't trigger
> the limit, but the newly executed process does).
>
> But it should really be something that you need to be truly insane to trigger.
>
> I think we still don't know whether we're going to be suid at the time
> we copy the arguments, do we?

We don't. (In fact, arg copying happens before we've even figured out
which binfmt is involved.) I lifted it to just before the point of no
return, but moving it before arg copying looks very hard (which
contributed to why we went with the implementation we did).

> So it's pretty painful to make the limits different for suid and
> non-suid binaries.

I would agree.

-Kees

--
Kees Cook
Pixel Security