Re: syscall_get_error() && TS_ checks

From: Linus Torvalds
Date: Thu Mar 30 2017 - 15:11:28 EST


On Thu, Mar 30, 2017 at 11:59 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>
>> And then actually run such a kernel on a 32-bit distro, and verifying
>> that things like gdb and strace really work. But it needs real
>> testing, not some kind of handwaving. It's a *big* change.
>
> I'll offer the following handwave: if there are problems, I'd expect
> to see them in mixed-bitness uses, not 32-bit distros. But the 32-bit
> case is worth testing, too.

I wouldn't worry too much about the mixed case, simply because you
clearly cannot use a 32-bit gdb on a 64-bit process.

So the mixed case already needs to use a 64-bit gdb, which presumably
would never use the 32-bit ptrace paths in the first place, so this
code never triggers.

Of course, the mroe testing the better, but the thing I'd really want
to check is that there isn't some 32-bit distro that might have a
library that is optimized and notices when it's running on a 64-bit
capable CPU and uses REX prefixes to use special optimized versions.

That would probably already break with a 32-bit gdb, but I can at
least in theory imagine code that simply depends on zero extension.

>> And in the signal handling path, the sign-extension and test of %eax
>> is reivially ok. Not because it's ok in general, but because we've
>> verified using %orig_eax that we're in a system call return path. We
>> could happily delete the whole TS_I386_REGS_POKED thing, I think.
>
> Then how do we know whether to sign extend? TS_COMPAT does *not* work
> because it isn't set on signal delivery.

Ok, so we'd still need that nasty TS_I386_REGS_POKED. I still think
that's "ok" within the particular confines of just signal delivery.

Of course, I don't actually know why we don't set that flag
unconditionally when we poke things using that 32-bit ptrace
interface, but that's just more of the "yeah, this code is crazy
hacky"

Cleaning things up is good.

I just don't want people to dismiss that the current code seems to
work well enough in practice. That's *probably* because nobody
actually uses it, but I do know that people used to run 64-bit kernels
with 32-bit user land exactly because they needed the kernel for large
memory machines (PAE really doesn't work for shit) but were carrying
32-bit userland along for whatever lazy legacy reasons.

So it has gotten *some* use. It's probably (hopefully) fading.

.. there's Wine and dosemu that do crazy things too. Things like "we
only set TS_I386_REGS_POKED when modifying orig_eax" might be
something that those kinds of programs have actually noticed and are
actively working around etc. Because while *normal* programs hopefully
don't do insane things on purpose, both wine and dosemu definitely
have been known to very much intentionally do some truly crazy stuff.

Linus