Re: [PATCH 2.6.34] ehci-hcd: add option to enable 64-bit DMA support
From: Robert Hancock
Date: Fri Feb 19 2010 - 20:30:41 EST
On Thu, Feb 18, 2010 at 9:54 PM, Greg KH <greg@xxxxxxxxx> wrote:
> On Thu, Feb 18, 2010 at 09:46:29PM -0600, Robert Hancock wrote:
>> On Thu, Feb 18, 2010 at 6:47 PM, Greg KH <greg@xxxxxxxxx> wrote:
>> >> > So you did not measure it?
>> >> >
>> >> > Hm, I guess this change must not be necessary :)
>> >>
>> >> I'll try and run some tests and see what I can quantify. However, I
>> >> only have 4GB of RAM on my machine (with a 1GB memory hole) and so a
>> >> random memory allocation only has a 25% chance of ending up in the
>> >> area where it would make a difference, so it may take a bit of doing.
>> >
>> > Without any good justification, including real tests being run, I can't
>> > take this patch, the risk is just too high.
>>
>> Again, this particular patch has essentially zero risk for anyone that
>> doesn't choose to experiment with the option. One can hardly say it
>> presents much of a long-term maintenance burden either..
>
> Then don't give them the option, as it doesn't seem needed :)
>
> Again, it is tough to remove options once you add them, so not adding
> them at all is the best thing to do.
I don't know why you would remove the option. Even if you someday
changed the default to 1, it would likely be a still idea to keep it
around for debugging purposes at least.
If you're complaining about options, ehci-hcd already has some which
are quite a bit more nebulous in usefulness than this one..
>
>> > And really, for USB 2.0 speeds, I doubt you are going to even notice
>> > this kind of overhead, it's in the noise. Especially given that almost
>> > always the limiting factor is the device itself, not the host.
>>
>> Well, I do have some results. This is from running this "dd
>> if=/dev/sdg of=/dev/null bs=3800M iflag=direct" against an OCZ Rally2
>> USB flash drive, which gets about 30 MB/sec on read, with CPU-burning
>> tasks on all cores in the background. (The huge block size and
>> iflag=direct is to try to force more of the IO to happen to memory
>> above the 4GB mark.) With that workload, swiotlb_bounce shows up as
>> between 1.5 to 4% of the CPU time spent in the kernel according to
>> oprofile. Obviously with the 64-bit DMA enabled, that disappears. Of
>> course, the overall kernel time is only around 2% of the total time,
>> so that's a pretty small overall percentage.
>
> 2% is noise, right? So overall you have not really shown any
> improvement.
What threshold of performance improvement would you rather see? It's
pretty clear that there will be a performance upside, even if small,
and no downside.
I honestly didn't expect as much resistance to a simple hardware
feature enablement patch, that has zero impact on anyone that doesn't
opt-in..
>
>> I'll try some tests later with a faster SATA-to-IDE device that should
>> stress things a bit more, but a huge difference doesn't seem likely.
>> One thing that's uncertain is just how much of the IO is needing to be
>> bounced - an even distribution of the buffer across all of physical
>> RAM would suggest 25% in this case, but I don't know an easy way to
>> verify that.
>>
>> Aside from speed considerations though, I should point out another
>> factor: IOMMU/SWIOTLB space is in many cases a limited resource for
>> all IO in flight at a particular time (SWIOTLB is typically 64MB). The
>> number of hits when Googling for "Out of IOMMU space" indicates it is
>> a problem that people do hit from time to time. From that perspective,
>> anything that prevents unnecessary use of bounce buffers is a good
>> thing.
>
> Sure, but again, for USB 2.0 stuff, we don't have many I/O in flight, as
> they are pretty slow devices.
Think that's a bit simplistic, if you have multiple devices active at
once, or multiple controllers (not at all uncommon these days, newer
Intel chipset machines have two EHCI controllers, with USB 1.x devices
handled through a logical hub with TT connected to each of them) that
can chew up more space.
>
> USB 3.0 is different, and that's a different driver, and hopefully that
> is all addressed already :)
Doesn't look like it, from the version in current -git anyway - I
don't see any calls to set DMA masks in the XHCI code so it will just
default to 32-bit. I imagine that'll hurt performance at 4.8 Gbps if
you've got lots of RAM..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/