Re: [PATCH] kconfig: Add kernel config option for fuzz testing.
From: Dmitry Vyukov
Date: Thu Dec 19 2019 - 12:43:53 EST
On Tue, Dec 17, 2019 at 4:52 PM Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
>
> On Tue, Dec 17, 2019 at 09:36:43AM +0100, Dmitry Vyukov wrote:
> > Yes, what Tetsuo says. Only syscall numbers and top-level arguments to
> > syscalls are easy to filter out. When indirect memory is passed to
> > kernel or (fd,ioctl) pairs are involved it boils down to solving the
> > halting problem.
>
> I disagree that it's equivalent to solving the halting problem.
> Otherwise, we couldn't filter in the kernel. Let's think about ways
> we can solve this. One is to simply do what valgrind does; this
> handles even self-modifying code, since you're essentially running an
> x86-to-x86 emulator, and then you find an attempted trap to the
> kernel, you can transfer control to a program which vets the arguments
> to the system call.
I don't know where to start :)
1. We don't run/have x86-to-x86 emulator and syzkaller is currently
supported on 6 architectures.
2. Complexity of this is very high (as compared to an if in kernel).
3. Valgrind-like solutions are a source of constant maintenance work
(we maintained valgrind for year at google).
4. Adding new architectures will be much harder.
5. All of this will need to be part of all C reproducers as well
(thousands of lines of intricate code, I think it was you who
complained about the complexity of even current C reproducers).
6. To not make this part of all reproducers we would need to run the
checking ahead of time (but this requires building complete and
precise kernel model and won't handle non-determinism; this is why I
referred to the "halting problem" assuming you don't want this in
reproducers).
7. This won't handle raciness between our checks and what kernel
really observes (i.e. we infer different fd type for ioctl, or we read
different data from memory).
8. This won't solve the problem of trust. If you receive this very
complex piece of code, will you be 100% sure that it's not syzkaller's
fault but a kernel bug that worth your time looking at.
So as far as I see this is both very complex and won't really work.
> Another approach might be to do this filtering in an BPF hook
> installed at syscall entry. Technically this is being done in the
> kernel, but the advantage of this approach is that the BPF program can
> be distributed alongside Syzkaller, and it can be Syzkaller-specific.
> That way when we need to add a new blacklist entry, it can be done
> without needing to wait for a kernel patch.
This is subject to most of the same problem.
E.g. these BPF programs will need to be part of all reproducers, so
you will need to compile them for your kernel and install before
running reproducers. Also racy wrt what we observe and what kernel
acts on. Again some trust problems (it is still complex). Building and
using syzkaller will be harder.
We need to keep in mind that we are comparing this with is a simple if
in kernel code.
> And note that there may *always* be some ioctls which we will need to
> suppress. For example, an attempt to send a SANITIZE ERASE to a
> storage device; or an attempt to freeze the root file system, etc.
> And I'm not sure all of these are ones that we can prevent by using
> the lockdown setting. There may very well be some commands that a
> legitamate system administrator might want to execute that will, when
> executed in the wrong circumstances causes the system to crash. But
> so long as it doesn't violate the trusted boot semantics which are the
> whole point of lockdown, we would need to allow them.
>
> So I suspect that some kind of filtering which is Syzkaller specific
> is going to be inevitably needed, if you want to throw random binary
> code and see what causes problem, and you insist on allowing these
> random binary bits to be run as root. Trying to prevent root from
> being able to kill or self-DOS a machine goes way beyond any of our
> current security mechanisms, and is something which is only really
> needed by Fuzzers. Personally, I suspect some kind of BPF filtering
> is probably your best bet, since it will a bit more architecturally
> portable than using some kind of Valgrind-like approach. (Although
> Valgrind *does* most of the architectures that I suspect we're going
> to care about.)
We can easily filter out syscall numbers and top level syscall
argument values (executing random binary code aside, as we gave up on
this for now). That's what we use to filter out reboot syscalls and
FIFREEZE ioctl (fortunately the value does not collide with any other
ioctl we have _for now_). This is done by scanning the test case and
fixing it if necessary (all the necessary data is already there).