Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt

From: Alexander Potapenko

Date: Fri May 22 2026 - 02:27:30 EST

On Tue, May 12, 2026 at 7:46 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> Hi,
>
> Thanks for this; the explanation of the KMSAN ABI is *very* helpful.
>
> I have a few questions and comments inline below.
>
> On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote:
> > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > For context, can you explain how this is expected to work across
> > > compilation units when the caller and callee *are* instrumented, when an
> > > argument is passed in a register?
> >
> > Per KMSAN ABI, the metadata for the function arguments is passed via
> > buffers in the so-called context state (see
> > include/linux/kmsan_types.h)
> > In the userspace, these buffers are thread-local variables referenced
> > by inline loads and stores.
> > In the kernel space, the compiler inserts a call
> > __msan_get_context_state() at the beginning of every function, and
> > then the instrumentation code uses whatever that function returned.
> >
> > Assume we have a function:
> >
> > int sum(int a, int b) {
> > result = a + b;
> > return result;
> > }
> >
> > Its instrumented version looks roughly as follows (we'll omit origin
> > tracking for simplicity):
> >
> > int sum(int a, int b) {
> > struct kmsan_context_state *kcs = __msan_get_context_state();
> > int s_a = ((int)kcs->param_tls)[0]; // shadow of a
> > int s_b = ((int)kcs->param_tls)[1]; // shadow of b
> > result = a + b;
> > s_result = s_a | s_b;
> > ((int)kcs->retval_tls)[0] = s_result; // returned shadow
> > return result;
> > }
> >
> > Most certainly `a` and `b` will be passed using registers, but that
> > doesn't matter: their metadata is safe as long as the caller does:
> >
> > ((int)kcs->param_tls)[0] = s_a;
> > ((int)kcs->param_tls)[1] = s_b;
> > result = sum(a, b);
> > s_result = ((int)kcs->retval_tls)[0];
>
> Ok. I see how that works when both caller and callee are instrumented.
>
> I assume that's tracked for all arguments regardless of type? e.g. that
> includes pointers, structs, etc?

That's tracked for all parameters passed to a function, whether on the
stack or in registers.
This includes cases when a struct can be passed by value, e.g.:

struct arg {
int a, b;
};

int sum(struct arg arg) {
return arg.a + arg.b;
}

For anything passed to the function by pointer, we only track the
initializedness of the pointer itself at the call site and the
function prologue.
Why is that? Because we already track the state of the pointed-to
memory elsewhere.
For every heap allocation, there is shadow memory backing it, which is
initially poisoned (unless it is a GFP_ZERO allocation).
Similarly, shadow memory backs every stack allocation. The state of
stack allocations is initially set by __msan_poison_alloca().
When we write an initialized value to a memory buffer, the
instrumentation code updates that buffer's shadow memory.

So the fact that a buffer is passed to a function by pointer does not
change that buffer's state; that's why we don't need additional
tracking of the pointed-to memory around the calls.
(this is all assuming both the caller and the callee are instrumented).

> > 3. There are cases in which KMSAN must be disabled for the whole
> > function (`__no_sanitize_memory`).
> > We disable instrumentation for `noinstr` functions. Additionally, on
> > x86 there is exactly one case where this is done to avoid infinite
> > recursion when storing the origins.
>
> In addition to the above cases, out-of-line assembly functions won't
> manipulate the shadow (so they're effectively the same as noinstr).

This is correct.

> > In the former case, KMSAN will not see the callee's side effects
> > (return values or memory stores), in the latter case, the callee may
> > receive incorrect shadow values for the function parameters or memory
> > stores in the caller. Both will
>
> Incomplete sentence here?

Yeah, sorry, disregard this.

>
> > We have few tools to deal with such cases. It often helps to move the
> > border between the instrumented and non-instrumented code by applying
> > __no_kmsan_checks or __no_sanitize_memory until we get to a point
> > where there are no incorrect shadow arguments.
> > Another way to deal with uninitialized data coming from
> > non-instrumented code is kmsan_unpoison_memory(address, size).
> > Unfortunately, as Thomas pointed out, we can't use it for locals in
> > non-instrumented code.
>
> It's clearly not sufficient to use kmsan_unpoison_memory() for locals,
> and I think it's non-sensical because conceptually they aren't memory.
>
> I don't think kmsan_unpoison_entry_regs() alone is sufficient for the
> regs. Surely the pointer argument itself needs to be unpoisoned before
> being passed to a callee?
>
> Otherwise, when pointer to regs is passed as an argument to a callee,
> the callee will consume stale shadow, which might indicate that the
> pointer to regs is uninitialized?
>
> That seems to be exactly what we're hitting with the irq entry state, so
> I'm surprised we're not hitting it for the regs. Is that sheer luck, or
> is something else protecting us?

Struct regs is created by non-instrumented code on the stack, that's
why its shadow contains whatever garbage that stack slot had when
leaving instrumented code.
For the data pointed to by individual registers, there are usually two
possibilities:
1. It is allocated by instrumented code and already has some state,
and struct regs is being used to transport it between instrumented
regions.
2. It is allocated by non-instrumented code, belongs to the same shim
that created it, and instrumented code is not supposed to touch it.

So I think this is not "sheer luck", but rather the fact that
non-instrumented code doesn't allocate memory.

> > In this case by "hints" I meant any kind of instrumentation that the
> > compiler could have inserted automatically to mark the data in the
> > instrumented functions as initialized.
> > There are no such "hints" right now (apart from letting KMSAN
> > instrument the function).
> > As for the manual annotations, we only have the mentioned
> > `__no_kmsan_checks`, `__no_sanitize_memory`, and
> > kmsan_unpoison_memory().
>
> Ok.
>
> For noinstr code calling instrumented code, I see that we could
> explicitly modify the argument shadow in the noinstr caller, but that's
> *really* grotty, and I don't think we want to do that.

Agreed.

>
> > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> > > > making all its inputs initialized. This is the easiest solution, it may
> > > > introduce false negatives, but we are on a very thin ice anyway, so perhaps
> > > > doing so is better than dealing with more false positives in the interrupt code.
> > > >
> > > > Another option for the callee would be applying `__always_inline`, so that
> > > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> > > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> > > > `__always_inline`, it might be the right thing to do.
> > >
> > > We can do that, but this really suggests that there's a fundemantal
> > > inability to pass arguments between code which is noinstr and code which
> > > isinstrumented with AddressSanitizer, and that's inevitably going to
> > > bite us in future.
> >
> > This is true (assuming you mean MemorySanitizer), but:
>
> Sorry, yes, I meant MemorySanitizer.
>
> > - I don't think we can avoid that, given that there will always be
> > non-instrumented code;
> > - so far the number of annotations has been manageable.
>
> As above, I suspect that we're actually missing many necessary
> annotations and getting away with this by accident.
>
> I suspect that we need additional compiler support to say something like
> "assume this argument/return value IS initialized".

There is already a more generic one saying "assume all
arguments/return values are uninitialized" - that's __no_kmsan_checks.
Do we need a narrower attribute knowing that for a call from
uninstrumented code we'll anyway need to mark all the arguments?

> > Hope the explanation above helps.
> > >
> > > If that's supposed to carry some global or current context, surely
> > > blatting that in entry code will affect the code that was interrupted?
> >
> > It will mostly affect the bottom-level function called by the entry
> > code, after that, nested functions will be just passing their
> > parameters as per KMSAN ABI.
>
> The problem I was explaining was the other way around: we clobber the
> context of the function that was executing *before* exception entry.
>
> Consider when the CPU is executing some function foo(), and an exception
> is taken. Within the exception handler, code will clobber the context
> state. AFAICT, that state is not saved/restored, and nor is it zeroed
> prior to exception return. The exception handler returns back to the
> middle of foo(). From foo's PoV, the context state has been clobbered
> arbitrarily, and that clobbered state could trigger false positives or
> false negatives.
>
> The way kmsan_get_context() uses in_task() will avoid that in *some*
> cases, but not *all* cases. In particular, I don't think that's going to
> help when a fault is taken for a uaccess.
>
> Maybe I'm missing something here?
>
> > > I see kmsan_get_context() has an in_task() check, but that can't help
> > > with nested exceptions, so this doesn't look right at all.
> >
> > KMSAN context is per-task, and if !in_task(), it is per-CPU.
> >
> > For the nested exceptions, we conservatively bail out (see
> > kmsan_in_runtime()) to avoid deadlocking.
> > This may cause false negatives.
>
> AFAICT that doesn't address the case I've described above.
>
> Mark.

You are right, it's quite messy for uaccess, and maybe other cases as well.
In the past, I attempted to use in_nmi(), in_hardirq(), in_softirq()
to switch between different per-cpu contexts, but that didn't work out
well, because preempt_count() could change in the middle of certain
low-level functions (which the instrumentation ignored, as the
reference to the context was only calculated in the prologue).
I think this part indeed works by sheer luck.
We need an annotation telling the tool where an interrupt context
starts and ends. On the compiler side, we'll probably also need an
intrinsic to reload the context pointer.

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg