Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node

From: John Garry
Date: Mon Jul 19 2021 - 13:42:47 EST


On 09/07/2021 15:24, Ming Lei wrote:
associated compromises.
Follows the log of 'perf report'

1) good(run fio from cpus in the nvme's numa node)

Hi Ming,

If you're still interested in this issue, as an experiment only you can try my rebased patches here:

https://github.com/hisilicon/kernel-dev/commits/private-topic-smmu-5.14-cmdq-4

I think that you should see a significant performance boost.

Thanks
John


- 34.86% 1.73% fio [nvme] [k] nvme_process_cq ▒
- 33.13% nvme_process_cq ▒
- 32.93% nvme_pci_complete_rq ▒
- 24.92% nvme_unmap_data ▒
- 20.08% dma_unmap_sg_attrs ▒
- 19.79% iommu_dma_unmap_sg ▒
- 19.55% __iommu_dma_unmap ▒
- 16.86% arm_smmu_iotlb_sync ▒
- 16.81% arm_smmu_tlb_inv_range_domain ▒
- 14.73% __arm_smmu_tlb_inv_range ▒
14.44% arm_smmu_cmdq_issue_cmdlist ▒
0.89% __pi_memset ▒
0.75% arm_smmu_atc_inv_domain ▒
+ 1.58% iommu_unmap_fast ▒
+ 0.71% iommu_dma_free_iova ▒
- 3.25% dma_unmap_page_attrs ▒
- 3.21% iommu_dma_unmap_page ▒
- 3.14% __iommu_dma_unmap_swiotlb ▒
- 2.86% __iommu_dma_unmap ▒
- 2.48% arm_smmu_iotlb_sync ▒
- 2.47% arm_smmu_tlb_inv_range_domain ▒
- 2.19% __arm_smmu_tlb_inv_range ▒
2.16% arm_smmu_cmdq_issue_cmdlist ▒
+ 1.34% mempool_free ▒
+ 7.68% nvme_complete_rq ▒
+ 1.73% _start


2) bad(run fio from cpus not in the nvme's numa node)
- 49.25% 3.03% fio [nvme] [k] nvme_process_cq ▒
- 46.22% nvme_process_cq ▒
- 46.07% nvme_pci_complete_rq ▒
- 41.02% nvme_unmap_data ▒
- 34.92% dma_unmap_sg_attrs ▒
- 34.75% iommu_dma_unmap_sg ▒
- 34.58% __iommu_dma_unmap ▒
- 33.04% arm_smmu_iotlb_sync ▒
- 33.00% arm_smmu_tlb_inv_range_domain ▒
- 31.86% __arm_smmu_tlb_inv_range ▒
31.71% arm_smmu_cmdq_issue_cmdlist ▒
+ 0.90% iommu_unmap_fast ▒
- 5.17% dma_unmap_page_attrs ▒
- 5.15% iommu_dma_unmap_page ▒
- 5.12% __iommu_dma_unmap_swiotlb ▒
- 5.05% __iommu_dma_unmap ▒
- 4.86% arm_smmu_iotlb_sync ▒
- 4.85% arm_smmu_tlb_inv_range_domain ▒
- 4.70% __arm_smmu_tlb_inv_range ▒
4.67% arm_smmu_cmdq_issue_cmdlist ▒
+ 0.74% mempool_free ▒
+ 4.83% nvme_complete_rq ▒
+ 3.03% _start