Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

From: Sai Prakash Ranjan
Date: Thu Jun 10 2021 - 20:54:34 EST

Next message: Tian, Kevin: "RE: Plan for /dev/ioasid RFC v2"
Previous message: Martin K. Petersen: "Re: [PING][PATCH v2 0/5] Bring the BusLogic host bus adapter driver up to Y2021"
In reply to: Krishna Reddy: "RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list"
Next in thread: Krishna Reddy: "RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Krishna,

On 2021-06-11 06:07, Krishna Reddy wrote:

> No, the unmap latency is not just in some test case written, the issue
> is very real and we have workloads where camera is reporting frame
> drops because of this unmap latency in the order of 100s of milliseconds.
> And hardware team recommends using ASID based invalidations for
> anything larger than 128 TLB entries. So yes, we have taken note of
> impacts here before going this way and hence feel more inclined to
> make this qcom specific if required.

Seems like the real issue here is not the unmap API latency.
It should be the high number of back to back SMMU TLB invalidate
register writes that is resulting
in lower ISO BW to Camera and overflow. Isn't it?
Even Tegra186 SoC has similar issue and HW team recommended to rate
limit the number of
back to back SMMU tlb invalidate registers writes. The subsequent
Tegra194 SoC has a dedicated SMMU for
ISO clients to avoid the impact of TLB invalidates from NISO clients on ISO BW.

Not exactly, this issue is not specific to camera. If you look at
the numbers in the commit text, even for the test device its the
same observation. It depends on the buffer size we are unmapping
which affects the number of TLBIs issue. I am not aware of any
such HW side bw issues for camera specifically on QCOM devices.

Thanks,
Sai

Thinking some more, I
wonder if the Tegra folks might have an opinion to add here, given
that their multiple-SMMU solution was seemingly about trying to get
enough TLB and pagetable walk bandwidth in the first place?

While it is good to reduce the number of tlb register writes, Flushing
all TLB entries at context granularity arbitrarily
can have negative impact on active traffic and BW. I don't have much
data on possible impact at this point.
Can the flushing at context granularity be made a quirk than
performing it as default?

-KR

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

Next message: Tian, Kevin: "RE: Plan for /dev/ioasid RFC v2"
Previous message: Martin K. Petersen: "Re: [PING][PATCH v2 0/5] Bring the BusLogic host bus adapter driver up to Y2021"
In reply to: Krishna Reddy: "RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list"
Next in thread: Krishna Reddy: "RE: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]