Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

From: Juergen Gross
Date: Tue Oct 25 2022 - 02:38:34 EST


On 24.10.22 14:55, Juergen Gross wrote:
On 24.10.22 13:56, Sreekanth Reddy wrote:
On Sun, Oct 23, 2022 at 6:57 AM Bart Van Assche <bvanassche@xxxxxxx> wrote:

On 10/21/22 02:22, Salvatore Bonaccorso wrote:
We got the following report in Debian after an update from 5.10.140 to
the current 5.10.149. Full quoting below (from
https://bugs.debian.org/1022126). Does this ring some bell about known
regressions?

Only three mpt3sas changes are new in v5.10.149 compared to v5.10.140:
$ git log --format=oneline v5.10.140..v5.10.149
2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return value check of dma_get_required_mask()
e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe scatterlist allocations to be within same 4 GB region
ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-after-free warning

Sreekanth and Suganath, can you help with bisecting this issue? For the
full report, see also https://lore.kernel.org/linux-scsi/Y1JkuKTjVYrOWbvm@xxxxxxxxxxx/.

This issue is getting observed after having the below patch changes,
2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
value check of dma_get_required_mask()

What is happening is that on Xen hypervisor, this
dma_get_required_mask() API always returns a 32 bit DMA mask. I.e. It
says that the minimum DMA mask required to access the host memory is
32 bit and hence mpt3sas driver is setting the DMA mask to 32bit. So,
on a 64 bit machine, if the driver set's the DMA mask to 32 bit then
SWIOTLB's bounce buffer comes into picture during IOs. Since these
bounce buffers are limited in size and hence we observe the IO hang if
the large IOs are issued.

I am not sure whether this API's return value is correct or not in the
Xen environment. If it is correct then I have to modify the driver to
not use this API and directly set the DMA mask to 64 bit if the system
is a 64bit machine.

Please recheck the backported patch in 5.10.y. It is _wrong_. The backport
has:

--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2993,7 +2993,7 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)

        if (ioc->is_mcpu_endpoint ||
            sizeof(dma_addr_t) == 4 || ioc->use_32bit_dma ||
-           dma_get_required_mask(&pdev->dev) <= 32)
+           dma_get_required_mask(&pdev->dev) <= DMA_BIT_MASK(32))
                ioc->dma_mask = 32;
        /* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */
        else if (ioc->hba_mpi_version_belonged > MPI2_VERSION)

While the upstream patch has:

+       if (ioc->is_mcpu_endpoint || sizeof(dma_addr_t) == 4 ||
+           dma_get_required_mask(&pdev->dev) <= 32) {
                ioc->dma_mask = 32;
+               coherent_dma_mask = dma_mask = DMA_BIT_MASK(32);

Sorry for this mistake of mine, which seems to have been caused by a git
inconsistency, as the upstream source is still showing the line

dma_get_required_mask(&pdev->dev) <= 32

I didn't double check which upstream patch was referenced by the backport
patch, but looked at the output of "git blame" to look at the last patch
older than the backport changing the line in question.

I didn't even think of the possibility that git could be wrong.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature