Re: [PATCH 1/4] Intel pci: Remove Host Bridge devices from identitymapping

From: Chris Wright
Date: Wed Mar 30 2011 - 15:15:55 EST


* Mike Travis (travis@xxxxxxx) wrote:
> Chris Wright wrote:
> >* Mike Travis (travis@xxxxxxx) wrote:
> >> When the IOMMU is being used, each request for a DMA mapping requires
> >> the intel_iommu code to look for some space in the DMA mapping table.
> >> For most drivers this occurs for each transfer.
> >>
> >> When there are many outstanding DMA mappings [as seems to be the case
> >> with the 10GigE driver], the table grows large and the search for
> >> space becomes increasingly time consuming. Performance for the
> >> 10GigE driver drops to about 10% of it's capacity on a UV system
> >> when the CPU count is large.
> >
> >That's pretty poor. I've seen large overheads, but when that big it was
> >also related to issues in the 10G driver. Do you have profile data
> >showing this as the hotspot?
>
> Here's one from our internal bug report:
>
> Here is a profile from a run with iommu=on iommu=pt (no forcedac)

OK, I was actually interested in the !pt case. But this is useful
still. The iova lookup being distinct from the identity_mapping() case.

> uv48-sys was receiving and uv-debug sending.
> ksoftirqd/640 was running at approx. 100% cpu utilization.
> I had pinned the nttcp process on uv48-sys to cpu 64.
>
> # Samples: 1255641
> #
> # Overhead Command Shared Object Symbol
> # ........ ............. ............. ......
> #
> 50.27%ESC[m ksoftirqd/640 [kernel] [k] _spin_lock
> 27.43%ESC[m ksoftirqd/640 [kernel] [k] iommu_no_mapping

> ...
> 0.48% ksoftirqd/640 [kernel] [k] iommu_should_identity_map
> 0.45% ksoftirqd/640 [kernel] [k] ixgbe_alloc_rx_buffers [
> ixgbe]

Note, ixgbe has had rx dma mapping issues (that's why I wondered what
was causing the massive slowdown under !pt mode).

<snip>
> I tracked this time down to identity_mapping() in this loop:
>
> list_for_each_entry(info, &si_domain->devices, link)
> if (info->dev == pdev)
> return 1;
>
> I didn't get the exact count, but there was approx 11,000 PCI devices
> on this system. And this function was called for every page request
> in each DMA request.

Right, so this is the list traversal (and wow, a lot of PCI devices).
Did you try a smarter data structure? (While there's room for another
bit in pci_dev, the bit is more about iommu implementation details than
anything at the pci level).

Or the domain_dev_info is cached in the archdata of device struct.
You should be able to just reference that directly.

Didn't think it through completely, but perhaps something as simple as:

return pdev->dev.archdata.iommu == si_domain;

thanks,
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/