RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
From: David Laight
Date: Tue Mar 20 2018 - 09:32:35 EST
From: Ingo Molnar
> Sent: 20 March 2018 10:54
...
> Note that a generic version might still be worth trying out, if and only if it's
> safe to access those vector registers directly: modern x86 CPUs will do their
> non-constant memcpy()s via the common memcpy_erms() function - which could in
> theory be an easy common point to be (cpufeatures-) patched to an AVX2 variant, if
> size (and alignment, perhaps) is a multiple of 32 bytes or so.
>
> Assuming it's correct with arbitrary user-space FPU state and if it results in any
> measurable speedups, which might not be the case: ERMS is supposed to be very
> fast.
>
> So even if it's possible (which it might not be), it could end up being slower
> than the ERMS version.
Last I checked memcpy() was implemented as 'rep movsb' on the latest Intel cpus.
Since memcpy_to/fromio() get aliased to memcpy() this generates byte copies.
The previous 'fastest' version of memcpy() was ok for uncached locations.
For PCIe I suspect that the actual instructions don't make a massive difference.
I'm not even sure interleaving two transfers makes any difference.
What makes a huge difference for memcpy_fromio() is the size of the register.
The time taken for a read will be largely independent of the width of the
register used.
David