Re: [PATCH v12 2/3] rust: add dma coherent allocator abstraction.

From: Simona Vetter
Date: Fri Mar 07 2025 - 05:19:56 EST


On Fri, Mar 07, 2025 at 09:50:07AM +0100, Danilo Krummrich wrote:
> On Thu, Mar 06, 2025 at 12:09:07PM -0400, Jason Gunthorpe wrote:
> > On Thu, Mar 06, 2025 at 04:21:51PM +0100, Simona Vetter wrote:
> > > > > > a device with no driver bound should not be passed to the DMA API,
> > > > > > much less a dead device that's already been removed from its parent
> > > > > > bus.
> > > >
> > > > Thanks for bringing this up!
> > > >
> > > > I assume that's because of potential iommu mappings, the memory itself should
> > > > not be critical.
> >
> > There is a lot of state tied to the struct device lifecycle that the
> > DMA API and iommu implicitly manages. It is not just iommu mappings.
> >
> > It is incorrect to view the struct device as a simple refcount object
> > where holding the refcount means it is alive and safe to use. There
> > are three broad substates (No Driver, Driver Attached, Zombie) that
> > the struct device can be in that are relevant.
> >
> > Technically it is unsafe and oopsable to call the allocation API as
> > well on a device that has no driver. This issue is also ignored in
> > these bindings and cannot be solved with revoke.
>
> This is correct, and I am well aware of it. I brought this up once when working
> on the initial device / driver, devres and I/O abstractions.
>
> It's on my list to make the creation of the Devres container fallible in this
> aspect, which would prevent this issue.
>
> For now it's probably not too critical; we never hand out device references
> before probe(). The only source of error is when a driver tries to create new
> device resources after the device has been unbound.
>
> > IOW I do not belive you can create bindings here that are truely safe
> > without also teaching rust to understand the concept of a scope
> > guaranteed to be within a probed driver's lifetime.
> >
> > > > > Also note that any HW configured to do DMA must be halted before the
> > > > > free is allowed otherwise it is a UAF bug. It is worth mentioning that
> > > > > in the documentation.
> > > >
> > > > Agreed, makes sense to document. For embedding the CoherentAllocation into
> > > > Devres this shouldn't be an issue, since a driver must stop operating the device
> > > > in remove() by definition.
> > >
> > > I think for basic driver allocations that you just need to run the device
> > > stuffing it all into devres is ok.
> >
> > What exactly will this revokable critical region protect?
> >
> > The actual critical region extends into the HW itself, it is not
> > simple to model this with a pure SW construct of bracketing some
> > allocation. You need to bracket the *entire lifecycle* of the
> > dma_addr_t that has been returned and passed into HW, until the
> > dma_addr_t is removed from HW.
>
> Devres callbacks run after remove(). It's the drivers job to stop operating the
> device latest in remove(). Which means that the design is correct.
>
> Now, you ask for a step further, i.e. make it that we can enforce that a driver
> actually stopped the device in remove().
>
> But that's just impossible, because obviously no one else than the driver knows
> the semantics of the devicei; it's the whole purpose of the driver. So, this is
> one of the exceptions where just have to trust the driver doing the correct
> thing.

In general it's impossible, but I think for specific cases like pci we can
enforce that bus mastering/interrupt generation/whatever else might cause
havoc is force-disabled after ->remove finishes. For platform devices this
is more annoying, but then it's much harder to physically yank a platform
devices. So I'm less worried about that being a practical concern there.

> Having that said, it doesn't need to be an "all or nothing", let's catch the
> ones we can actually catch.

Yeah, agreed.
-Sima
--
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch