Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt

From: Mark Rutland

Date: Tue Jun 02 2026 - 08:00:37 EST

On Fri, May 22, 2026 at 08:26:30AM +0200, Alexander Potapenko wrote:
> On Tue, May 12, 2026 at 7:46 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote:
> > > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> > > > For context, can you explain how this is expected to work across
> > > > compilation units when the caller and callee *are* instrumented, when an
> > > > argument is passed in a register?
> > >
> > > Per KMSAN ABI, the metadata for the function arguments is passed via
> > > buffers in the so-called context state (see
> > > include/linux/kmsan_types.h)
> > > In the userspace, these buffers are thread-local variables referenced
> > > by inline loads and stores.
> > > In the kernel space, the compiler inserts a call
> > > __msan_get_context_state() at the beginning of every function, and
> > > then the instrumentation code uses whatever that function returned.
> > >
> > > Assume we have a function:
> > >
> > > int sum(int a, int b) {
> > > result = a + b;
> > > return result;
> > > }
> > >
> > > Its instrumented version looks roughly as follows (we'll omit origin
> > > tracking for simplicity):
> > >
> > > int sum(int a, int b) {
> > > struct kmsan_context_state *kcs = __msan_get_context_state();
> > > int s_a = ((int)kcs->param_tls)[0]; // shadow of a
> > > int s_b = ((int)kcs->param_tls)[1]; // shadow of b
> > > result = a + b;
> > > s_result = s_a | s_b;
> > > ((int)kcs->retval_tls)[0] = s_result; // returned shadow
> > > return result;
> > > }
> > >
> > > Most certainly `a` and `b` will be passed using registers, but that
> > > doesn't matter: their metadata is safe as long as the caller does:
> > >
> > > ((int)kcs->param_tls)[0] = s_a;
> > > ((int)kcs->param_tls)[1] = s_b;
> > > result = sum(a, b);
> > > s_result = ((int)kcs->retval_tls)[0];
> >
> > Ok. I see how that works when both caller and callee are instrumented.
> >
> > I assume that's tracked for all arguments regardless of type? e.g. that
> > includes pointers, structs, etc?
>
> That's tracked for all parameters passed to a function, whether on the
> stack or in registers.
> This includes cases when a struct can be passed by value, e.g.:
>
> struct arg {
> int a, b;
> };
>
> int sum(struct arg arg) {
> return arg.a + arg.b;
> }
>
> For anything passed to the function by pointer, we only track the
> initializedness of the pointer itself at the call site and the
> function prologue.

Yep, that was what I expected. Thanks for confirming!

> Why is that? Because we already track the state of the pointed-to
> memory elsewhere.
> For every heap allocation, there is shadow memory backing it, which is
> initially poisoned (unless it is a GFP_ZERO allocation).
> Similarly, shadow memory backs every stack allocation. The state of
> stack allocations is initially set by __msan_poison_alloca().
> When we write an initialized value to a memory buffer, the
> instrumentation code updates that buffer's shadow memory.
>
> So the fact that a buffer is passed to a function by pointer does not
> change that buffer's state; that's why we don't need additional
> tracking of the pointed-to memory around the calls.
> (this is all assuming both the caller and the callee are instrumented).

Understood.

[...]

> > > We have few tools to deal with such cases. It often helps to move the
> > > border between the instrumented and non-instrumented code by applying
> > > __no_kmsan_checks or __no_sanitize_memory until we get to a point
> > > where there are no incorrect shadow arguments.
> > > Another way to deal with uninitialized data coming from
> > > non-instrumented code is kmsan_unpoison_memory(address, size).
> > > Unfortunately, as Thomas pointed out, we can't use it for locals in
> > > non-instrumented code.
> >
> > It's clearly not sufficient to use kmsan_unpoison_memory() for locals,
> > and I think it's non-sensical because conceptually they aren't memory.
> >
> > I don't think kmsan_unpoison_entry_regs() alone is sufficient for the
> > regs. Surely the pointer argument itself needs to be unpoisoned before
> > being passed to a callee?
> >
> > Otherwise, when pointer to regs is passed as an argument to a callee,
> > the callee will consume stale shadow, which might indicate that the
> > pointer to regs is uninitialized?
> >
> > That seems to be exactly what we're hitting with the irq entry state, so
> > I'm surprised we're not hitting it for the regs. Is that sheer luck, or
> > is something else protecting us?
>
> Struct regs is created by non-instrumented code on the stack, that's
> why its shadow contains whatever garbage that stack slot had when
> leaving instrumented code.

I'm talking about the *pointer itself*, not the memory the pointer
points to. I understand that the shadow for the memory will contain
garbage, but I'm asking about the tracking of the *local variable* that
is a pointer to that memory.

Note: I assume that by 'struct regs' you mean 'struct pt_regs'.

> For the data pointed to by individual registers, there are usually two
> possibilities:
> 1. It is allocated by instrumented code and already has some state,
> and struct regs is being used to transport it between instrumented
> regions.
> 2. It is allocated by non-instrumented code, belongs to the same shim
> that created it, and instrumented code is not supposed to touch it.
>
> So I think this is not "sheer luck", but rather the fact that
> non-instrumented code doesn't allocate memory.

As above, that doesn't address my concern, becuase I'm talking about the
*pointer* not the *memory*. I still think that it's sheer luck that
callees think the *pointer* is initiailiased.

What ensures that the *pointer to the regs* is marked as initialised?

The entry asm doesn't ensure that. The noinstr entry code only unpoisons
the memory, but not the pointer itself. The instrumented code called by
the noinstr code doesn't seem to have anything to suppress checks on the
pointer argument.

AFAICT, at the point where noinstr code calls instrumented code, we're
relying on stable shadow indicating that register arguments have been
initialized, including the pointer to the regs.

[...]

> > > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> > > > > making all its inputs initialized. This is the easiest solution, it may
> > > > > introduce false negatives, but we are on a very thin ice anyway, so perhaps
> > > > > doing so is better than dealing with more false positives in the interrupt code.
> > > > >
> > > > > Another option for the callee would be applying `__always_inline`, so that
> > > > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> > > > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> > > > > `__always_inline`, it might be the right thing to do.
> > > >
> > > > We can do that, but this really suggests that there's a fundemantal
> > > > inability to pass arguments between code which is noinstr and code which
> > > > isinstrumented with AddressSanitizer, and that's inevitably going to
> > > > bite us in future.
> > >
> > > This is true (assuming you mean MemorySanitizer), but:
> >
> > Sorry, yes, I meant MemorySanitizer.
> >
> > > - I don't think we can avoid that, given that there will always be
> > > non-instrumented code;
> > > - so far the number of annotations has been manageable.
> >
> > As above, I suspect that we're actually missing many necessary
> > annotations and getting away with this by accident.
> >
> > I suspect that we need additional compiler support to say something like
> > "assume this argument/return value IS initialized".
>
> There is already a more generic one saying "assume all
> arguments/return values are uninitialized" - that's __no_kmsan_checks.
> Do we need a narrower attribute knowing that for a call from
> uninstrumented code we'll anyway need to mark all the arguments?

We certainly need to be inhibiting checks in cases where we're not
today.

How does that work for something like:

void do_something_with_foo(struct foo *f)
{
[[ accesses foo fields here ]]
}

void called_from_noinstr(struct foo *f) __no_kmsan_checks
{
do_something_with_foo(f);
}

noinstr void foo(void)
{
struct foo f = { ... };
called_from_noinstr(f);
}

... e.g. will called_from_noinstr() initialize the argument when calling
do_something_with_foo() ?

[...]

> > Consider when the CPU is executing some function foo(), and an exception
> > is taken. Within the exception handler, code will clobber the context
> > state. AFAICT, that state is not saved/restored, and nor is it zeroed
> > prior to exception return. The exception handler returns back to the
> > middle of foo(). From foo's PoV, the context state has been clobbered
> > arbitrarily, and that clobbered state could trigger false positives or
> > false negatives.
> >
> > The way kmsan_get_context() uses in_task() will avoid that in *some*
> > cases, but not *all* cases. In particular, I don't think that's going to
> > help when a fault is taken for a uaccess.
> >
> > Maybe I'm missing something here?
> >
> > > > I see kmsan_get_context() has an in_task() check, but that can't help
> > > > with nested exceptions, so this doesn't look right at all.
> > >
> > > KMSAN context is per-task, and if !in_task(), it is per-CPU.
> > >
> > > For the nested exceptions, we conservatively bail out (see
> > > kmsan_in_runtime()) to avoid deadlocking.
> > > This may cause false negatives.
> >
> > AFAICT that doesn't address the case I've described above.
> >
> > Mark.
>
> You are right, it's quite messy for uaccess, and maybe other cases as well.
> In the past, I attempted to use in_nmi(), in_hardirq(), in_softirq()
> to switch between different per-cpu contexts, but that didn't work out
> well, because preempt_count() could change in the middle of certain
> low-level functions (which the instrumentation ignored, as the
> reference to the context was only calculated in the prologue).

To be frank, I don't think we can rely on per-cpu contexts here, and
that shapre of check is the wrong approach.

AFAICT, the only way to make this work without false negatives is to
have this nest, with entry code saving/restoring the original context.
Given the size of that context, it seems impractical to put that on the
stack, so I don't see how we could reaonably make that work.

We might get away with clobbering the state when returning from an
exception, marking all everything as initialized, but that will lead to
false negatives a lot of the time.

> I think this part indeed works by sheer luck.
> We need an annotation telling the tool where an interrupt context
> starts and ends. On the compiler side, we'll probably also need an
> intrinsic to reload the context pointer.

As above, I think there are further problems with that.

Mark.