On Tue, Dec 24, 2019 at 11:20:25AM +0000, Marc Zyngier wrote:
On 2019-12-24 01:59, Ming Lei wrote:
On Mon, Dec 23, 2019 at 10:47:07AM +0000, Marc Zyngier wrote:
On 2019-12-23 10:26, John Garry wrote:
mayI've also managed to trigger some of them now that I haveaccess to
a decent box with nvme storage.
I only have 2x NVMe SSDs when this occurs - I should not be
Out of curiosity, have you tried
with the SMMU disabled? I'm wondering whether we hit somelivelock
condition on unmapping buffers...
No, but I can give it a try. Doing that should lower the CPU
so maybe masks the issue - probably not.
Lots of CPU lockup can is performance issue if there isn't
I am wondering if you may explain it a bit why enabling SMMU
triggersaveThe other way around. mapping/unmapping IOVAs doesn't comes for
CPU a it?
I'm trying to find out whether the NVMe map/unmap patterns
withoutsomething unexpected in the SMMU driver, but that's a very long
So I tested v5.5-rc3 with and without the SMMU enabled, and
the SMMU enabled I don't get the lockup.
OK, so my hunch wasn't completely off... At least we have something
to look into.
Obviously this is not conclusive, especially with such limitedthe
testing - 5 minute runs each. The CPU load goes up when disabling
SMMU, but that could be attributed to extra throughput (1183K ->context,
I do notice that since we complete the NVMe request in irq
we also do the DMA unmap, i.e. talk to the SMMU, in the samecontext,
which is less than ideal.
It depends on how much overhead invalidating the TLB adds to the
equation, but we should be able to do some tracing and find out.
I need to finish for the Christmas break today, so can't checkthis
much further ATM.
No worries. May I suggest creating a new thread in the new year,
involving Robin and Will as well?
Zhang Yi has observed the CPU lockup issue once when running heavy IO on
single nvme drive, and please CC him if you have new patch to try.
On which architecture? John was indicating that this also happen on x86.
To be honest, I never see such CPU lockup issue on x86 in case of running
heavy IO on single NVMe drive.
Then looks the DMA unmap cost is too big on aarch64 if SMMU is involved.
So far, we don't have any data suggesting that this is actually the case.
Also, other workloads (such as networking) do not exhibit this behaviour,
while being least as unmap-heavy as NVMe is.
Maybe it is because networking workloads usually completes IO in softirq
context, instead of hard interrupt context.
If the cross-architecture aspect is confirmed, this points more into
the direction of an interaction between the NVMe subsystem and the
DMA API more than an architecture-specific problem.
Given that we have so far very little data, I'd hold off any conclusion.
We can start to collect latency data of dma unmapping vs nvme_irq()
on both x86 and arm64.
I will see if I can get a such box for collecting the latency data.