Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
From: Alexei Starovoitov
Date: Fri Jan 29 2021 - 16:06:47 EST
On Fri, Jan 29, 2021 at 02:01:03PM -0500, Steven Rostedt wrote:
> On Fri, 29 Jan 2021 18:59:43 +0100
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > On Fri, Jan 29, 2021 at 09:45:48AM -0800, Alexei Starovoitov wrote:
> > > Same things apply to bpf side. We can statically prove safety for
> > > ftrace and kprobe attaching whereas to deal with NMI situation we
> > > have to use run-time checks for recursion prevention, etc.
> >
> > I have no idea what you're saying. You can attach to functions that are
> > called with random locks held, you can create kprobes in some very
> > sensitive places.
> >
> > What can you staticlly prove about that?
>
> I think the main difference is, if you attach a kprobe or ftrace function,
> you can theoretically analyze the location before you do the attachment.
Excatly.
When we're writing bpf helpers we need to carefully think about reentrance and NMI.
If the helper looks like:
int nokprobe notrace bpf_something(...)
{
// access variables A and B
}
The implementation can rely on the fact that even if the helper is reentrant
the state of A and B will be consistent. Either both got touched or none.
Only NMI condition we have to worry about, because A could be modified
without touching B.
If we write it as
int nokprobe bpf_something(...) { ... }
that would be another case.
Here we need to consider the case that bpf prog can be attached to it via fentry nop.
But no need to worry about instructions split in the middle because of kprobe via int3.
Since we have big "if (in_nmi()) goto unsupported;" check in the beginning we
only need to worry about combinations of kprobe at the start of the func,
kprobe anywhere inside the func via int3, and ftrace at the start.
Not having to think of NMI helps a ton.
My earlier this_cpu vs __this_cpu comment is an example of that.
If in_nmi is filtered early it's one implementation. If nmi has to be handled
it's completely different algorithm.
Now you've broke all this logic by making int3 to be marked as 'in_nmi' and
bpf in kprobe in the middle of the func are now broken.
Do people use that? Yeah they do.
We have to fix it.
What were your reasons to make int3 in_nmi?
I've read the commit log, but I don't see in it the actual motivation
for int3 other than "it looks like NMI to me. Let's make it so".
The commit logs talk about cpu exceptions. I agree that #DB and #MC do behave like NMI.
But #BP is not really. My understanding it's used by kprobes and text_poke_bp only.
If the motivation was to close some issue with text_poke_bp then, sure,
let's make handling of text_poke_bp to be treated as nmi.
But kprobe is not that.
I'm thinking either of the following solutions would be generic:
- introduce another state to preempt flags like "kernel exception"
- remove kprobe's int3 from in_nmi
As bpf specific alternative we can do:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6c0018abe68a..37cc549ad52e 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -96,7 +96,7 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
{
unsigned int ret;
- if (in_nmi()) /* not supported yet */
+ if (in_nmi() && !kprobe_running()) /* not supported yet */
return 1;
but imo it's less clean than the first two options.