Re: arm64: Register modification during syscall entry/exit stop

From: Keno Fischer
Date: Sat May 23 2020 - 01:36:56 EST


I got bitten by this again, so I decided to write up a simple example
that shows the problem:

https://gist.github.com/Keno/cde691b26e32373307fb7449ad305739

This runs the same child twice. First vanilla where it prints "Hello world".
The second time, using a textbook ptrace example, to only print "world".
The problem here is that by the time the ptracer gets around to restoring
the registers, it's no longer in a syscall stop, so the write to x7 does not
get ignored and the correct value of x7 gets clobbered.
I copied the syscall definition from musl, so the compiler thinks x7 is
live, and we can see an assertion.

Output on my machine (will depend on compiler version, etc.):
```
$ gcc -g3 -O3 ptrace_lies.c
$ ./a.out
Hello World
World
a.out: ptrace_lies.c:49: do_child: Assertion `v3 == values[2]' failed.
a.out: ptrace_lies.c:134: main: Assertion `WIFEXITED(status) &&
WEXITSTATUS(status) == 0' failed.
Aborted (core dumped)
```

However, I don't think that whether or not the compiler thinks that x7 is
live is the problem here. The problem is entirely that this mechanism
prevents the ptracer from precisely controlling the register state. While
basic ptracers don't need this feature (strace),
more advanced ptracers (think criu, etc.) absolutely do want to precisely
control what the register state is.
The ptracer I'm working on (https://rr-project.org/)
happens to be an extreme case of this, where it wants *bitwise* equivalent
register states such that it can run the same code many times and get
the exact same results.

Also, if the issue was just that the kernel clobbered x7, that would be fine
we could deal with that no problem. However, it's much worse than that,
because the behavior of the kernel with respect to x7 depends on what
kind of ptrace stop we're in and even worse, in some kinds of stop,
there's absolutely no way to get at the actual value of x7.

> Hmm, does that actually result in the SVC instruction getting inlined? I
> think that's quite dangerous, since we document that we can trash the SVE
> register state on a system call, for example. I'm also surprised that
> the register variables are honoured by compilers if that inlining can occur.

I haven't gotten to trying SVE yet, so I appreciate the warning :). That said,
deterministic clobbering of registers is fine. Even changing the registers to
random junk is fine. We're happy to read those registers through ptrace.
The problem here is that the kernel lies about what the contents of the x7
register is and discards any writes to it.

I really hope we can come up with a solution here, I'm already dreading
the next time I unexpectedly run into this and have to add yet
another special case :(.

Keno