Re: Report in downstream Debian: mpt3sas broken with xen dom0 with update to 5.10.149 in 5.10.y.

From: James Bottomley
Date: Mon Oct 24 2022 - 10:52:41 EST


On Mon, 2022-10-24 at 17:26 +0530, Sreekanth Reddy wrote:
> On Sun, Oct 23, 2022 at 6:57 AM Bart Van Assche <bvanassche@xxxxxxx>
> wrote:
> > On 10/21/22 02:22, Salvatore Bonaccorso wrote:
> > > We got the following report in Debian after an update from
> > > 5.10.140 to
> > > the current 5.10.149. Full quoting below (from
> > > https://bugs.debian.org/1022126). Does this ring some bell about
> > > known
> > > regressions?
> >
> > Only three mpt3sas changes are new in v5.10.149 compared to
> > v5.10.140:
> > $ git log --format=oneline v5.10.140..v5.10.149
> > 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
> > value check of dma_get_required_mask()
> > e7fafef9830c4a01e60f76e3860a9bef0262378d scsi: mpt3sas: Force PCIe
> > scatterlist allocations to be within same 4 GB region
> > ea10a652ad2ae2cf3eced6f632a5c98f26727057 scsi: mpt3sas: Fix use-
> > after-free warning
> >
> > Sreekanth and Suganath, can you help with bisecting this issue? For
> > the
> > full report, see also
> > https://lore.kernel.org/linux-scsi/Y1JkuKTjVYrOWbvm@xxxxxxxxxxx/.
>
> This issue is getting observed after having the below patch changes,
> 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8 scsi: mpt3sas: Fix return
> value check of dma_get_required_mask()
>
> What is happening is that on Xen hypervisor, this
> dma_get_required_mask() API always returns a 32 bit DMA mask. I.e. It
> says that the minimum DMA mask required to access the host memory is
> 32 bit and hence mpt3sas driver is setting the DMA mask to 32bit.

This sounds entirely correct because the VM is booted with (from the
original debian bug report):

dom0_mem=4096M,max:4096M dom0_max_vcpus=4 dom0_vcpus_pin
ucode=scan xpti=dom0=false,domu=true gnttab_max_frames=128

So it has no memory above 4GB and thus 32 bit addressing is the minimum
required. If you boot a machine with >4GB and Xen still returns a 32
bit mask here, then we have a Xen problem.

> So, on a 64 bit machine, if the driver set's the DMA mask to 32 bit
> then SWIOTLB's bounce buffer comes into picture during IOs. Since
> these bounce buffers are limited in size and hence we observe the IO
> hang if the large IOs are issued.

Why is the SWIOTLB active if all the physical memory in the VM is
within the range of the DMA mask? If this is really happening, it
sounds like a SWIOTLB bug.

> I am not sure whether this API's return value is correct or not in
> the Xen environment. If it is correct then I have to modify the
> driver to not use this API and directly set the DMA mask to 64 bit if
> the system is a 64bit machine.

The original design of the API is to describe exactly the minimum
direct DMA requirements. There are a large number of cards with
multiple DMA register formats, the most common being to use either a
compact 32 bit or an expanded 64 bit register to describe a page
location. The former gives 39 bits of addressing and the latter 64.
If the DMA mask is 39 bits or below as described by this API, then the
card can use the compact address form.

James