Re: Regression: commit da029c11e6b1 broke toybox xargs.

From: Rob Landley
Date: Fri Nov 03 2017 - 19:58:45 EST


On 11/02/2017 10:40 AM, Linus Torvalds wrote:
> On Wed, Nov 1, 2017 at 9:28 PM, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> Behavior changed. Things that test particular limits will get different
>> results. That's not breakage.
>>
>> Did an actual user application or script break?

Only due to getting the limit wrong. The actual failure's in the android
internal bugzilla I've never been able to read:

http://lists.landley.net/pipermail/toybox-landley.net/2017-September/009167.html

But it boils down to "got the limit wrong, the exec failed after the
fork(), dynamic recovery from which is awkward so I'm trying to figure
out the right limit".

> Ahh. I should have read that email more carefully. If xargs broke,
> that _will_ break actual scripts, yes. Do you actually set the stack
> limit to insane values? Anybody using toybox really shouldn't be doing
> 32MB stacks.

Toybox is the default command line of android since M, which went 64 bit
in L, and the Pixel 2 phone has 4 gigs of ram. My goal with toybox is to
turn android into a self-hosting development environment no longer
cross-compiled from a PC (http://landley.net/talks/celf-2013.txt) so I'm
trying to implement a command line that can run the entire AOSP build.

I.E. I have no idea what people will do with it, and try not to get in
their way.

My problem here is it's hard to figure out what exec size the limit
_is_. There's a sysconf(_SC_ARG_MAX) which bionic and glibc are
currently returning as stack_limit/4, which is now too big and exec()
will error out after the fork. Musl is returning the 131072 limit from
2011-ish, meaning "/bin/echo $(printf '%0*d' 131071)" works but
"printf '%0*d' 131071 | xargs" fails, an inconsistency I was trying to
avoid. Maybe I don't have that luxury...

Each argument has its own limit separate from the argv+envp total limit,
but there's only one "size" you can query through sysconf, so the
querying API is insufficient at the design level.

Meanwhile under bash you can allocate and dirty 256 megabytes from the
command line with:

echo $(printf '%0*d' $((1<<28)))

Because it's a shell builtin so there's no actual exec. (And if
https://sourceware.org/bugzilla/show_bug.cgi?id=17829 ever gets fixed
it'll go back to allowing INT_MAX.)

Posix is its usual helpful self, read conservatively
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html
says to break the line at 2048 bytes.

> So I still do wonder if this actually breaks anything real, or just a
> test-suite or something?

I've cc'd Elliott, who would know. (He's the Android base os userspace
maintainer, he knows everything. Or can at least decode
http://b/65818597 .)

But this just broke my _fix_, not the earlier deployed stuff. I removed
the size measuring code when the 131072 limit went away, the bug was
there's a new limit I need to not hit, I tried to figure out what the
limit is now, confirmed that the various libc implementations don't
agree, then the actual kernel limit changed again while I was looking at it.

> Linus

Should I just go back to hardwiring in 131072? It's no _less_ arbitrary
than 10 megs, and it sounds like getting it _right_ is unachievable.

Thanks,

Rob