Re: dangers of bots on the mailing lists was Re: divide error in ___bpf_prog_run

From: Daniel Borkmann
Date: Thu Jan 18 2018 - 09:46:34 EST


On 01/18/2018 02:10 PM, Dmitry Vyukov wrote:
> On Wed, Jan 17, 2018 at 12:09 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>> On Wed, Jan 17, 2018 at 10:49 AM, Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
>>> Don't know if there's such a possibility, but it would be nice if we could
>>> target fuzzing for specific subsystems in related subtrees directly (e.g.
>>> for bpf in bpf and bpf-next trees as one example). Dmitry?
>>
>> Hi Daniel,
>>
>> It's doable.
>> Let's start with one bpf tree. Will it be bpf or bpf-next? Which one
>> contains more ongoing work? What's the exact git repo address/branch,
>> so that I don't second guess?

I'm actually thinking that bpf tree [1] would be my preferred choice.
While most of the development happens in bpf-next, after the merge
window it will all end up in bpf eventually anyway and we'd still have
~8 weeks for targeted fuzzing on that before a release goes out. The
other advantage I see on bpf tree itself would be that we'd uncover
issues from fixes that go into bpf tree earlier like the recent
max_entries overflow reports where syzkaller fired multiple times after
the commit causing it went already into Linus' tree. Meaning, we'd miss
out on that if we would choose bpf-next only, therefore my preferred
choice would be on bpf.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

>> Also what syscalls it makes sense to enable there to target it at bpf
>> specifically? As far as I understand effects of bpf are far beyond the
>> bpf call and proper testing requires some sockets and other stuff. For

Yes, correct. For example, the ones in ...

* syzbot+93c4904c5c70348a6890@xxxxxxxxxxxxxxxxxxxxxxxxx
* syzbot+48340bb518e88849e2e3@xxxxxxxxxxxxxxxxxxxxxxxxx

... are a great find (!), and they all require runtime testing, so
interactions with sockets are definitely needed as well (e.g. the
SO_ATTACH_BPF and writes to trigger traffic going through). Another
option is to have a basic code template to attach to a loopback device
e.g. in a netns and have a tc clsact qdisc with cls_bpf filter
attached, so the fd would be passed to cls_bpf setup and then traffic
goes over loopback to trigger prog run. Same could be for generic XDP
as another example. Unlike socket filters this is root only though,
but it would have more functionality available to fuzz into and I
see robustness here as critically important. There's also a good
bunch of use cases available in BPF kernel selftests which is under
tools/testing/selftests/bpf/ to get a rough picture for fuzzing, but
it doesn't cover all prog types, maps etc though. But overall, I think
it's fine to first start out small and see how it goes.

>> sockets, will it be enough to enable ip/ipv6? Because if we enable all
>> of sctp/dccp/tipc/pptp/etc, it will sure will be finding lots of bugs
>> there as well. Does bpf affect incoming network packets?

Yes, see also comment above. For socket filters this definitely makes
sense as well and there were some interactions in the past in the proto
handlers that were buggy e.g. for odd historic reasons socket filters
allow to truncate skbs (back from classic BPF times), and that required
a reload of some of the prior referenced headers since underlying data
could have changed in the meantime (aka use after free) and some handlers
got that wrong, so probably makes sense to include some of the protos,
too, to cover changes there.

>> Also are there any sysctl's, command line arguments, etc that need to
>> be tuned. I know there are net.core.bpf_jit_enable/harden, but I don't
>> know what's the most relevant combination. Ideally, we test all of
>> them, but let start with one of them because it requires separate
>> instances (since the setting is global and test programs can't just
>> flip it randomly).

Right, I think the current one you set in syzkaller is fine for now.

>> Also do you want testing from root or not from root? We generally
>> don't test under root, because syzkaller comes up with legal ways to
>> shut everything down even if we try to contain it (e.g. kill init
>> somehow or shut down network using netlink). But if we limit syscall
>> surface, then root may work and allow testing staging bpf features.

If you have a chance to testing under both, root and non-root, that
would be best. non-root has a restricted set of features available,
so coverage would be increased under root, but I see both equally
important (to mention one, coming back to the max_elem overflow example
from earlier, this got only triggered for non-root).

Btw, I recently checked out the bpf API model in syzkaller and it
was all in line with latest upstream, very nice to see that!

One more thought on future work could also be to experiment with
syzkaller to have it additionally generate BPF progs in C that it
would then try to load and pass traffic through. That may be worth
trying in addition to the insns level fuzzing.

> So, Daniel, Alexei,
>
> I understand that I asked lots of questions, but they are relatively
> simple. I need that info to setup proper testing.

Thanks a lot,
Daniel