If the only issue were devices which cannot do scatter-gather, I
would certainly agree. However, except for the SGI O2 (which only
cares about 64 KB pages in hardware, anyway), all of the SGI hardware
has been happy to do scatter-gather. What we found with (high
resolution) digital media and other applications which do a lot of
large DMAs was that the overhead of doing the equivalent of
map_kiobuf()/unmap_kiobuf() for large buffers composed of many small
pages was substantial, compared to doing it for large buffers composed
of large pages. Admittedly, the IRIX equivalent is less efficient
than map_kiobuf(), but map_kiobuf() does still have to touch a lot of
cache lines when visiting all of the small pages in a large buffer.
Then too, there is the matter of TLB misses for applications which
visit a lot of data, especially on processors with reasonably large
caches. With 4 KB pages and 64 TLB entries, the TLB cannot map all of
a cache larger than 256 KB. If the cache is, say, 2 MB and the
application cycles through many of the pages in the cache in a loop,
you can wind up with a TLB miss for almost every load (other than those from
the stack). With 1 MB pages, there are almost no TLB misses.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/