kasan for bpf

From: Alexei Starovoitov
Date: Fri Dec 22 2017 - 23:32:12 EST


Hi All,

the recent bugs make people question the safety of bpf and not a surprise
that some folks propose to set kernel.unprivileged_bpf_disabled = 1 by default.

I think it will be wrong long term decision, since it will steer
bpf into "root only" mentality. The verifier checks will become sloppier
and developers will worry less about missing safety checks.
I'd like bpf to go the other way. The root-only programs should be
treated with the same respect as unprivileged.

Today out of 15 program types only one BPF_PROG_TYPE_SOCKET_FILTER is
allowed for unpriv while in many cases like XDP, CLS, ACT, SKB, SOCK progs
could be relaxed to cap_net_admin and sometimes to unpriv.
I think we need to disallow leaking kernel address in XDP, CLS and
all other networking programs. They should be operating on packets
and should never leak addresses. The existing approach:
env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
if (env->allow_ptr_leaks) ...;
if (env->allow_ptr_leaks) ...;
is just a big hammer.
Realistically only tracing programs need to deal with kernel data and
they probably should be restricted further with additional sysctl or CAPs.

So instead of existing two classes of root/unpriv we need several:
- unpriv prog type
- all prog types for unpriv but execute via bpf_prog_run only (for testing)
- net_admin without leaks (that can run in containers)
- root without leaks
- root with leaks for tracing only

As far as unpriv I think it's clear that the verifier only
is not enough and we need to optionally add run-time checks.
I think 'kasan for bpf' by default would be good first step.
The basic idea is that all accessible memory (stack, ctx, maps)
will have shadow bits and every load/store will be automatically
instrumented with shadow memory checks.
'kasan for bpf' will be on by default for unpriv and extra
sysctl to turn it off in setups that cannot tolerate
slightly degraded performance.
For root and other classes of programs we will be able to
turn it on for testing as well.
I think that will stop many exploits except side channel attacks.

Thoughts?