Re: Direct I/O

From: Stephen C. Tweedie (sct@redhat.com)
Date: Mon Mar 13 2000 - 10:32:40 EST


Hi,

On Mon, 13 Mar 2000 12:54:55 +0100, Alexander S A Kjeldaas
<Alexander.Kjeldaas@fast.no> said:

> "touch page tables" was inaccurate. What I meant was that I do not
> want any part of the page-table of the process to be invalidated
> during a "common" read.

That's exactly what you get from the current raw IO mechanism.

>> Chuck Lever has been doing a lot of work on this sort of thing. He has
>> been posting patches for mincore() (to find out the residency of an area
>> of memory), and madvise() (to force pagein or discarding of memory
>> ranges).

> Do you or anyone have a pointer to these patches?

They have been all over linux-kernel in the past few weeks. You can
search for it on
        http://www.uwsg.indiana.edu/hypermail/linux/kernel/

> What parts of the kernel is going to start using kiobufs? I looked
> through the kernel and it looked like kiobufs were basically only used
> by the raw devices. How does kiobufs relate to for instance
> bufferheads?

Kiobufs can be used as arbitrary containers when passing data between
bits of the kernel. Ultimately we probably want to use kiobufs down
at the block device level directly: using raw IO, the overhead of the
buffer_head management shows up as a massive performance cost in terms
of CPU time.

Basically, the whole reason for having kiobufs is to abstract out the
selection of data for IO from the place where the IO is actually
performed. That's why it is used in raw IO: we don't implement buffer
cache IO into user space, we just make a brw_kiovec() function to do
kiobuf IO to block devices, and then there is an entirely separate
map_user_kiobuf() function to map user addresses into a kiobuf. The
IO layer doesn't have to know anything about virtual memory.

> One more question: Will the raw devices be able to switch a high-mem
> page into a dma-friendly page if the page is used as a buffer for a
> direct I/O read?

No.

> I expect that on a high-mem machine, the kernel will give out
> primarily high-mem pages to processes. It would be great if the
> kernel could automatically "upgrade" high-mem pages that are used as
> read buffers for raw I/O, to low-mem pages (<2G).

That's just a half-way hack workaround: the real answer is to support
block device IO to the high memory pages. The kernel is already
pretty much set up for this, as we now have a struct page * embedded
in the buffer_head to let drivers find the correct high memory
location. Currently, we just use bounce buffers because no drivers
have got DAC support built in, but that will change post-2.4.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Mar 15 2000 - 21:00:25 EST