Re: [PATCH net v3] net/mlx5: fix calling mlx5_cmd_init() before DMA mask is set

From: Saeed Mahameed
Date: Wed Oct 11 2023 - 14:56:30 EST


On 11 Oct 11:20, Saeed Mahameed wrote:
On 11 Oct 09:57, Niklas Schnelle wrote:
Since commit 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and
reload routines") mlx5_cmd_init() is called in mlx5_mdev_init() which is
called in probe_one() before mlx5_pci_init(). This is a problem because
mlx5_pci_init() is where the DMA and coherent mask is set but
mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA
allocation is done during probe before the correct mask is set. This
causes probe to fail initialization of the cmdif SW structs on s390x
after that is converted to the common dma-iommu code. This is because on
s390x DMA addresses below 4 GiB are reserved on current machines and
unlike the old s390x specific DMA API implementation common code
enforces DMA masks.

Fix this by moving set_dma_caps() out of mlx5_pci_init() and into
probe_one() before mlx5_mdev_init(). To match the overall naming scheme
rename it to mlx5_dma_init().

How about we just call mlx5_pci_init() before mlx5_mdev_init(), instead of
breaking it apart ?

I just posted this RFC patch [1]:

I am working in very limited conditions these days, and I don't have strong
opinion on which approach to take, Leon, Niklas, please advise.

The three possible solutions:

1) mlx5_pci_init() before mlx5_mdev_init(), I don't think enabling pci
before initializing cmd dma would be a problem.

2) This patch.

3) Shay's patch from the link below:
[1] https://patchwork.kernel.org/project/netdevbpf/patch/20231011184511.19818-1-saeed@xxxxxxxxxx/

Thanks,
Saeed.