Re: Compat 32-bit syscall entry from 64-bit task!?

From: Jamie Lokier
Date: Thu Jan 26 2012 - 06:48:18 EST


Indan Zupancic wrote:
> On Thu, January 26, 2012 11:31, Jamie Lokier wrote:
> > Indan Zupancic wrote:
> >> Yes, that's the only reason I'm interested in BPF, really.
> >> Most system calls are either always allowed, or always denied.
> >> Of the ones that need checking, most of them have file paths.
> >> For those I'm not interested in the post-syscall event.
> >
> > Same here, though for tracing file paths rather than blocking anything.
>
> The jailer I wrote works pretty well as a simplistic strace replacement.
> It can only print out the arguments we're checking, but that's usually
> the more interesting info.

In theory such a thing should be easy to write, but as we both found,
ptrace() on Linux has a huge number of difficult quirks to deal with
to trace reliably. At least it's getting better with later kernels.

> >> Those issues are not equivalent. ARM only has that OABI thing which
> >> is hopefully not used in practice.
> >
> > I am still using OABI on some currently-sold and still-developed
> > devices with userspace libraries that I can't replace or rebuild.
> > Maybe I'm the only one, but the issue is still there. It should be
> > supported in ptrace() as long as it's supported in the kernel at all.
>
> It's not a 32 versus 64-bit issue though, so it will be something on
> its own anyway. Can as well add an extra ARM specific ptrace command
> to get that info, or hack it in some other way. For instance, ip is
> (ab)used to tell if it is syscall entry or exit, so doing these tricks
> isn't anything new in ARM either.

In theory, aren't we supposed to know whether it's entry/exit anyway?
Why does strace care? Have there been kernel bugs in the past? Maybe
it was just to deal with SIGTRAP-after-exit in the past, which could
be delivered at an unpredictable time if blocked and then unblocked by
sigreturn().

> You can't avoid the arch-specific knowledge, because depending on the
> answer, you have to do something arch specific. In ARM's OABI case, it's
> reading program memory to find out the system call number, of all things.
> (I hope I read the code wrong). So ARM's solution would need to get all
> info it needs to handle the system call securely without reading any text
> memory, otherwise it's racy.

A few archs read program memory to get the syscall number even now, in
the current strace source. Look for PEEKTEXT: S390, ARM, SPARC use it
on every syscall entry, and X86_64 has it commented out.

As we know, all of them are buggy if the memory is modified while
reading it, and it's silly because the kernel knows the syscall
number.

> And then there's the whole confusion what that flag says, some might think
> it says in what mode the tracee is instead of what mode the system call is.
> That those two can be different is not obvious at all and seems very x86_64
> specific.

My rough read of PARISC entry code suggests it has two entry methods,
similar to ARM and x86_64, but I'm not really familiar with PARISC and
I don't have a machine handy to try it out :-)

> I'm not sure what you're doing, but perhaps we should share code and write
> a kind of Linux ptrace library. The code I wrote was university stuff and
> we want to release it, but it will take a while to get things sorted out.
> Hopefully it's released in April, maybe before.

I've been thinking along similar lines. The idea came up when I was
hacking on strace last year and it so wanted to be cleaned up (but now
strace is in good hands, my work on it is obsolete); now I'm doing
ptracing for other purposes. Denys' ptrace API document, currently in
strace git, is extremely useful.

Denys, would you be interested in further refactoring strace to use a
"libsystrace" sort of thing which abstracts the detail of archs,
tracing (and maybe syscall argument layout) away from the printing and
user-interface, for strace's use and other users? I would be happy to
help with that and keep strace's non-Linux support as well (if there's
any way to test the latter...) I seem to be going in the direction of
a library like that anyway for another project.

The seccomp-BPF stuff could also benefit from a part dealing with
syscall argument layout, as it too needs needs that arch-specific
knowledge. I have a script in progress which extracts all the
per-arch and per-ABI syscall numbers, syscall argument layouts and
kernel function names to keep track of arch-specific fixups, from a
Linux source tree. It currently works on all archs except it breaks
on x86 which insists on being diferent ;-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/