Re: [PATCH] dma: mark unmapped DMA scatter/gather invalid

From: Marc Orr
Date: Mon Jan 11 2021 - 16:37:27 EST


> >>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> >>> index 0a4881e59aa7..3d9b17fe5771 100644
> >>> --- a/kernel/dma/direct.c
> >>> +++ b/kernel/dma/direct.c
> >>> @@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
> >>> struct scatterlist *sg;
> >>> int i;
> >>>
> >>> - for_each_sg(sgl, sg, nents, i)
> >>> + for_each_sg(sgl, sg, nents, i) {
> >>> dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
> >>> attrs);
> >>> + sg->dma_address = DMA_MAPPING_ERROR;
> >>
> >> There are more DMA API backends than just dma-direct, so while this
> >> might help paper over bugs when SWIOTLB is in use, it's not going to
> >> have any effect when those same bugs are hit under other circumstances.
> >> Once again, the moral of the story is that effort is better spent just
> >> fixing the bugs ;)
> >
> > Thanks for the quick feedback. What is the correct fix? I understand
> > the first half. The NVMe driver should be updated to not call unmap on
> > an address that has already been unmapped within the DMA direct code.
> > Where I'm less certain is how to communicate to the NVMe driver that
> > the mapping failed. In particular, the NVMe code explicitly checks if
> > the first DMA address in the scatter/gather list is set to
> > DMA_MAPPING_ERROR. Thus, don't we need to update the DMA direct code
> > to propagate DMA_MAPPING_ERROR back up to the driver, via the
> > scatter/gather struct?
>
> Erm, you check the return value of dma_map_sg(). If it's zero, the
> request failed; if it's nonzero, that's how many DMA segments you now
> have to process. See Documentation/core-api/dma-api.rst.
>
> The only guarantee offered about the state of the scatterlist itself is
> that if it is successfully mapped, then the dma_address and dma_length
> fields are valid for that many segments, and if that is fewer than the
> total number of physical segments then the next one after the final DMA
> segment will have a dma_length of 0. In particular there are no
> guarantees at all about the state if the mapping was unsuccessful.
>
> If a driver is failing to keep track of the success/failure status and
> later down the line trying to guess what to do with a list that may or
> may not have been mapped, then frankly that driver should be redesigned
> because that is a terrible anti-pattern. At the very very least it
> should explicitly encode its own "known bad" state upon failure that it
> can then reliably recognise later.

Got it now. I'll get to work on a patch for the NVMe driver to fix the
bug. Thanks for all of these pointers. They are immensely helpful.

Thanks,
Marc