Re: [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space
From: liulongfang
Date: Fri Nov 24 2023 - 23:10:14 EST
On 2023/11/24 20:01, Baolu Lu wrote:
> On 2023/11/24 14:30, liulongfang wrote:
>> On 2023/11/15 11:02, Lu Baolu Wrote:
>>> When a user-managed page table is attached to an IOMMU, it is necessary
>>> to deliver IO page faults to user space so that they can be handled
>>> appropriately. One use case for this is nested translation, which is
>>> currently being discussed in the mailing list.
>>>
>>> I have posted a RFC series [1] that describes the implementation of
>>> delivering page faults to user space through IOMMUFD. This series has
>>> received several comments on the IOMMU refactoring, which I am trying to
>>> address in this series.
>>>
>>> The major refactoring includes:
>>>
>>> - [PATCH 01 ~ 04] Move include/uapi/linux/iommu.h to
>>> include/linux/iommu.h. Remove the unrecoverable fault data definition.
>>> - [PATCH 05 ~ 06] Remove iommu_[un]register_device_fault_handler().
>>> - [PATCH 07 ~ 10] Separate SVA and IOPF. Make IOPF a generic page fault
>>> handling framework.
>>> - [PATCH 11 ~ 12] Improve iopf framework for iommufd use.
>>>
>>> This is also available at github [2].
>>>
>>> [1] https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> [2] https://github.com/LuBaolu/intel-iommu/commits/preparatory-io-pgfault-delivery-v7
>>>
>>> Change log:
>>> v7:
>>> - Rebase to v6.7-rc1.
>>> - Export iopf_group_response() for global use.
>>> - Release lock when calling iopf handler.
>>> - The whole series has been verified to work for SVA case on Intel
>>> platforms by Zhao Yan. Add her Tested-by to affected patches.
>>>
>>> v6: https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> - [PATCH 09/12] Check IS_ERR() against the iommu domain. [Jingqi/Jason]
>>> - [PATCH 12/12] Rename the comments and name of iopf_queue_flush_dev(),
>>> no functionality changes. [Kevin]
>>> - All patches rebased on the latest iommu/core branch.
>>>
>>> v5: https://lore.kernel.org/linux-iommu/20230914085638.17307-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> - Consolidate per-device fault data management. (New patch 11)
>>> - Improve iopf_queue_flush_dev(). (New patch 12)
>>>
>>> v4: https://lore.kernel.org/linux-iommu/20230825023026.132919-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> - Merge iommu_fault_event and iopf_fault. They are duplicate.
>>> - Move iommu_report_device_fault() and iommu_page_response() to
>>> io-pgfault.c.
>>> - Move iommu_sva_domain_alloc() to iommu-sva.c.
>>> - Add group->domain and use it directly in sva fault handler.
>>> - Misc code refactoring and refining.
>>>
>>> v3: https://lore.kernel.org/linux-iommu/20230817234047.195194-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> - Convert the fault data structures from uAPI to kAPI.
>>> - Merge iopf_device_param into iommu_fault_param.
>>> - Add debugging on domain lifetime for iopf.
>>> - Remove patch "iommu: Change the return value of dev_iommu_get()".
>>> - Remove patch "iommu: Add helper to set iopf handler for domain".
>>> - Misc code refactoring and refining.
>>>
>>> v2: https://lore.kernel.org/linux-iommu/20230727054837.147050-1-baolu.lu@xxxxxxxxxxxxxxx/
>>> - Remove unrecoverable fault data definition as suggested by Kevin.
>>> - Drop the per-device fault cookie code considering that doesn't make
>>> much sense for SVA.
>>> - Make the IOMMU page fault handling framework generic. So that it can
>>> available for use cases other than SVA.
>>>
>>> v1: https://lore.kernel.org/linux-iommu/20230711010642.19707-1-baolu.lu@xxxxxxxxxxxxxxx/
>>>
>>> Lu Baolu (12):
>>> iommu: Move iommu fault data to linux/iommu.h
>>> iommu/arm-smmu-v3: Remove unrecoverable faults reporting
>>> iommu: Remove unrecoverable fault data
>>> iommu: Cleanup iopf data structure definitions
>>> iommu: Merge iopf_device_param into iommu_fault_param
>>> iommu: Remove iommu_[un]register_device_fault_handler()
>>> iommu: Merge iommu_fault_event and iopf_fault
>>> iommu: Prepare for separating SVA and IOPF
>>> iommu: Make iommu_queue_iopf() more generic
>>> iommu: Separate SVA and IOPF
>>> iommu: Consolidate per-device fault data management
>>> iommu: Improve iopf_queue_flush_dev()
>>>
>>> include/linux/iommu.h | 266 +++++++---
>>> drivers/iommu/intel/iommu.h | 2 +-
>>> drivers/iommu/iommu-sva.h | 71 ---
>>> include/uapi/linux/iommu.h | 161 ------
>>> .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 14 +-
>>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 51 +-
>>> drivers/iommu/intel/iommu.c | 25 +-
>>> drivers/iommu/intel/svm.c | 8 +-
>>> drivers/iommu/io-pgfault.c | 469 ++++++++++++------
>>> drivers/iommu/iommu-sva.c | 66 ++-
>>> drivers/iommu/iommu.c | 232 ---------
>>> MAINTAINERS | 1 -
>>> drivers/iommu/Kconfig | 4 +
>>> drivers/iommu/Makefile | 3 +-
>>> drivers/iommu/intel/Kconfig | 1 +
>>> 15 files changed, 601 insertions(+), 773 deletions(-)
>>> delete mode 100644 drivers/iommu/iommu-sva.h
>>> delete mode 100644 include/uapi/linux/iommu.h
>>>
>>
>> Tested-By: Longfang Liu <liulongfang@xxxxxxxxxx>
>
> Thank you for the testing.
>
>>
>> The Arm SVA mode based on HiSilicon crypto accelerator completed the functional test
>> and performance test of page fault scenarios.
>> 1. The IOMMU page fault processing function is normal.
>> 2. Performance test on 128 core ARM platform. performance is reduced:
>>
>> Threads Performance
>> 8 -0.77%
>> 16 -1.1%
>> 32 -0.31%
>> 64 -0.49%
>> 128 -0.72%
>> 256 -1.7%
>> 384 -4.94%
>> 512 NA(iopf timeout)
>>
>> Finally, continuing to increase the number of threads will cause iommu's page fault
>> processing to time out(more than 4.2 seconds).
>> This problem occurs both in the before version(kernel6.7-rc1) and
>> in the after modification's version.
>
> Probably you can check whether commit 6bbd42e2df8f ("mmu_notifiers: call
> invalidate_range() when invalidating TLBs") matters.
>
> It was discussed in this thread.
>
> https://lore.kernel.org/linux-iommu/20231117090933.75267-1-baolu.lu@xxxxxxxxxxxxxxx/
>
Thanks for your reminder. But the reason for the iopf timeout in this test scenario is
different from what is pointed out in your patch.
Our analysis found that the emergence of iopf is related to the numa balance function.
The CMWQ solution for iommu's iopf currently uses a large number of kernel threads.
The page fault processing in the numa balance function will compete with the page fault
processing in iommu to occupy the CPU.
This will lead to a longer page fault processing time and trigger repeated page faults
in the IO task.This will produce an unpredictable and huge amount of page fault events,
eventually causing the entire system to be unable to respond to page fault processing
in a timely manner.
Thanks.
Longfang.
> Best regards,
> baolu
>
> .
>