Re: [PATCH v2 3/5] KVM: guest_memfd: Handle errors from xa_store_range() when binding

From: Sean Christopherson

Date: Wed May 27 2026 - 18:45:15 EST


On Wed, May 27, 2026, Ackerley Tng wrote:
> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
> > On Fri, May 22, 2026, Ackerley Tng wrote:
> > If @entry is non-NULL, xa_store_range() pre-creates the entire range, before
> > storing anything into the range:
> >
> > if (entry) {
> > unsigned int order = BITS_PER_LONG;
> > if (last + 1)
> > order = __ffs(last + 1);
> > xas_set_order(&xas, last, order);
> > xas_create(&xas, true);
> > if (xas_error(&xas))
> > goto unlock;
> > }
> >
>
> xa_store_range() doesn't actually always iterate: if last + 1 is some
> clean power of 2, it'll create a higher order xarray node.
>
> Otherwise, it falls back to creating and storing 1 index/node at a time:

Ugh, _that's_ what the code is doing? Argh, I missed that "first" is incremented
by whatever the batch size happened to be.

first += xas_size(&xas); <====
} while (first <= last);

> if the above did manage to create an xarray node, xas_error() returns
> false, it goes on to the store below.
>
> > Yes, the API handles failure on the subsequent xas_store(), but I can't imagine
> > that failure is actually, barring garbage input from KVM:
> >
> > do {
> > xas_set_range(&xas, first, last);
> > xas_store(&xas, entry);
> > if (xas_error(&xas))
> > goto unlock;
> > first += xas_size(&xas);
> > } while (first <= last);
> >
>
> So if a later xas_create() fails because it runs out of memory, the
> earlier stores would have already been committed.
>
> This ignores -EEXIST being returned since earlier in kvm_gmem_bind()
> conflicts were already checked.
>
> > Purely from a design perspective, providing an API that can fail partway through
> > under normal operation, with no indication of where failure occured (AFAICT),
> > would be awful.
> >
>
> Do you mean the API of xas_store_range()?

No, I mean xa_store_range(). AFAICT, on failure, it doesn't actually communicate
"where" failure occurred. That's quite nasty.

> xas is updated by xas_set_range() so that should track the last store. Since
> the cleanup is storing NULLs and won't allocate, I thought it would be fine
> to just store NULL on the entire range on error.

Yeah, it's totally fine, and AFAICT the only remotely sane approach.