Re: [PATCH v2] panic: Taint kernel if fault injection has been used

From: Alexei Starovoitov
Date: Wed Dec 07 2022 - 23:36:39 EST


On Wed, Dec 07, 2022 at 07:48:06AM -0500, Steven Rostedt wrote:
> On Tue, 6 Dec 2022 20:45:17 -0800
> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
>
> > > "G - Proprietary module" - "O - out of tree module"
> > >
> > > Can you reproduce this without those taints?
> >
> > Lol. That question is exactly the reason why my Nack stands.
>
> First, that's a BS reason for a NACK.
>
> But in all seriousness, what I would actually ask (and what I'll ask now)
> is, what module did you use that is out of tree, and was it relevant to
> this test?
>
> That's a reasonable question to ask, and one that only gets asked with a
> taint.

When we receive bug reports in bpf@vger the only question we ask is:
"How to reproduce this bug?"
We ignoring taint completely.
tbh I didn't even know that our BPF CI causes taint until that email.

In the previous email I said that the bug report comes from bpf selftest on bpf-next.
Clearly there is no ambiguity that this is as upstream as it can get.
Yet for 2 days this 'taint' arguing is preventing people from looking at the bug.
And that happens all the time on lkml. Somebody reports a bug and kernel devs
jump on the poor person:
"Can you repro without taint?",
"Can you repro with upstream kernel?"
This is discouraging.
The 'taint' concept makes it easier for kernel devs to ignore bug reports
and push back on the reporter. Do it few times and people stop reporting bugs.
Say, this particular bug in rethook was found by one of our BPF CI developers.
They're not very familiar with the kernel, but they can see plenty of 'rethook'
references in the stack trace, lookup MAINTAINER file and ping Massami,
but to the question "can you repro without taint?" they can only say NO,
because this is how our CI works. So they will keep silence and the bug will be lost.
That's not the only reason why I'm against generalizing 'taint'.
Tainting because HW is misbehaving makes sense, but tainting because
of OoO module or because of live-patching does not.
It becomes an excuse that people abuse.

Right now syzbot is finding all sorts of bugs. Most of the time syzbot
turns error injection on to find those allocation issues.
If syzbot reports will start coming as tainted there will be even less
attention to them. That will not be good.

> If there's a BPF injection taint, one can ask that same question, as the
> bug may happen sometime after the injection but be caused by that injection,
> and not be in the backtrace. Not all kernel developers have the access to
> debugging utilities that backend production servers have. A lot of bugs that
> kernel developers debug are from someone's laptop.

I would have to disagree.
We see few human reported bugs and prioritize them higher,
but majority are coming from the production fleet, test tiers,
syzbot, CIs, and automated things.

> Where all they have is
> that backtrace.

Even laptops typically leave vmcore after crash. distro support packages have
clear rules what scripts to run to collect all the necessary data in case
of the crash from vmcore.
These tools support extracting everything bpf related too.
For example see:
Documentation/bpf/drgn.rst
It works on vmcore and on the running kernel.

> If a tool or root kit, added function error injection, it
> would be extremely useful information to debug what happened.
>
> I don't understand the push back here.

All these years we've been working on improving bpf introspection and
debuggability. Today crash dumps look like this:
bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377
bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37
bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline]
__bpf_prog_run include/linux/filter.h:600 [inline]
bpf_prog_run include/linux/filter.h:607 [inline]

The 2nd from the top is a bpf prog. The rest are kernel functions.
bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk
^^ is a prog tag ^^ name of bpf prog

If you do 'bpftool prog show' you can see both tag and name.
'bpftool prog dump jited'
dumps x86 code mixed with source line text.
Often enough +0x2b offset will have some C code right next to it.

One can monitor all prog load/unload via perf or via audit.
'perf record' collects all bpf progs that were loaded before
the start and even thouse that were loaded and unloaded quickly
while 'perf record' was running.
So 'perf report' has all the data including annotation and source code.

Currently we're working on adding 'file.c:lineno' to dump_stack()
for bpf progs.

If you have ideas how we can improve debuggability we are all ears.