pgprot_writecombine() and PATs on x86
From: Roland Dreier
Date: Wed Apr 25 2007 - 14:02:45 EST
Where do your patches to add an implementation of
pgprot_writecombine() using PATs on x86 stand? The mlx4 driver I'm
planning on merging for 2.6.22 would really like writecombining, and
I'm interested in doing the work to finally get the PAT stuff merged
(probably for 2.6.23 I guess).
Just to give a little background on my motivation: the mlx4 hardware
allows a page in its PCI space to be mapped, where the driver can write
descriptors and payloads directly, instead of ringing a doorbell and
having the HW fetch the descriptor from system memory, for better latency.
The driver allows this page to be mapped to userspace and used
directly from latency sensitive stuff like MPI applications. I added
a hacked up version of the PAT stuff to map the page into userspace
with pgprot_writecombine(), and that lowered the measured MPI latency
by 450 nsecs, which doesn't seem like much until I tell you that the
latency when from ~1.85 usec to ~1.4 usec. So copying the descriptor
is a huge part of the total latency and users are definitely going to
want that 25% improvement.
Anyway as I said I want to get the PAT stuff upstream, so I'm posting
this to find out the latest state and avoid duplicating work.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/