Re: [PATCH v2 1/2] vfio/pci: Fix racy bitfields and tighten struct layout
From: Alex Williamson
Date: Tue May 12 2026 - 14:50:53 EST
On Tue, 12 May 2026 10:18:12 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> On Mon, May 11, 2026 at 04:16:02PM -0600, Alex Williamson wrote:
> > Bitfield operations are not atomic, they use a read-modify-write
> > pattern, therefore we should be careful not to pack bitfields that
> > can be concurrently updated into the same storage unit.
> >
> > The split fields (virq_disabled, bardirty, pm_intx_masked,
> > pm_runtime_engaged, sriov_pwr_active) are mutated post-init from
> > contexts that don't serialize against the other writers in the same
> > storage unit, so a bitfield RMW could drop an adjacent field's
> > update. The remaining bitfields are touched only during probe or
> > close where no concurrent writer exists, so they stay packed.
> >
> > While reordering, place virq_disabled and bardirty earlier to fill
> > an existing alignment hole.
>
> I feel like a comment is needed here for the various bool groupings
>
> 'write locked by XX' or something?
I can provide that, but there are several ways we can approach this.
As I dig into pm_intx_masked vs pm_runtime_engaged, there's an implicit
pm_runtime_get before pm_runtime_engaged, while pm_intx_masked is only
modified in the .suspend/.resume callbacks. So those cannot actually
race. needs_reset is set on close, which is already serialized, and
also via ioctl, which again does a pm_runtime_get, and indirectly takes
memory_lock, so it seems safe that it could share a storage unit.
OTOH, virq_disabled and bardirty are both modified by config space
writes, and while there's likely serialization in a VM, vfio-pci itself
doesn't provide any.
So in the strictest fix, maybe only virq_disabled and bardirty are
pulled out of the bitfield, but the dependencies are sufficiently
subtle that I wonder if it doesn't make sense to limit bitfield use to
anything serialized by probe/open/close and anything dynamically
updated while the device is opened should use its own storage unit.
The mlx5 patch has similar subtle dependencies, mdev_detach and
log_active are serialized by state_mutex, but deferred_reset is set
with reset_lock.
It's not clear the bit compaction is worth the subtle RMW scenarios.
What do you think, should we reserve bitfields for setup/release-time to
avoid this class of issue or handle these as individual point fixes?
Thanks,
Alex