Re: readahead on directories
From: Phillip Susi
Date: Wed Apr 21 2010 - 16:59:38 EST
On 4/21/2010 4:22 PM, Jamie Lokier wrote:
> Because tests have found that it's sometimes faster than AIO anyway!
Not when the aio is working properly ;)
This is getting a bit off topic, but aio_read() and readahead() have to
map the disk blocks before they can queue a read. In the case of ext2/3
this often requires reading an indirect block from the disk so the
kernel has to wait for that read to finish before it can queue the rest
of the reads and return. With ext4 extents, usually all of the mapping
information is in the inode so all of the reads can be queued without
delay, and the kernel returns to user space immediately.
So older testing done on ext3 likely ran into this and lead to the
conclusion that threading can be faster, but it would be preferable when
using ext4 with extents to drop the read requests in the queue without
the bother of setting up and tearing down threads, which is really just
a workaround for a shortcoming in aio_read and readahead() when using
indirect blocks. For that matter aio_read and readahead() could
probably benefit from some reworking to fix this so that they can return
as soon as they have queued the read of the indirect block, and queueing
the remaining reads can be deferred until the indirect block comes in.
> ...for those things where AIO is supported at all. The problem with
> more complicated fs operations (like, say, buffered file reads and
> directory operations) is you can't just put a request in a queue.
Unfortunately there aren't async versions of the calls that make
directory operations, but aio_read() performs a buffered file read
asynchronously just fine. Right now though I'm only concerned with
reading lots of data into the cache at boot time to speed things up.
> Those things where putting a request on a queue works tend to move the
> sleepable metadata fetching to the code _before_ the request is queued
> to get around that. Which is one reason why Linux O_DIRECT AIO can
> still block when submitting a request... :-/
Yep, as I just described. Would be nice to fix this.
> The most promising direction for AIO at the moment is in fact spawning
> kernel threads on demand to do the work that needs a context, and
> swizzling some pointers so that it doesn't look like threads were used
> to userspace.
NO! This is how aio was implemented at first and it was terrible.
Context is only required because it is easier to write the code linearly
instead of as a state machine. It would be better for example, to have
readahead() register a callback function to be called when the read of
the indirect block completes, and the callback needs zero context to
queue reads of the data blocks referred to by the indirect block.
> You might even find that calling readahead() on *files* goes a bit
> faster if you have several threads working in parallel calling it,
> because of the ability to parallelise metadata I/O.
Indeed... or you can use extents, or fix the implementation of
readahead() ;)
> So you're saying it _does_ readahead_size if needed. That's great!
I'm not sure, I'm just saying that if it does, it does not help much
since most directories fit in a single 4kb block anyhow. I need to get
a number of different directories read quickly.
> Filesystem-independent readahead() on directories is out of the
> question (except by using a kernel background thread, which is
> pointless because you can do that yourself.)
No need for a thread. readahead() does not need one for files, reading
the contents of a directory should be no different.
> Some filesystems have directories which aren't stored like a file's
> data, and the process of reading the directory needs to work through
> its logic, and needs a sleepable context to work in. Generic page
> reading won't work for all of them.
If the fs absolutely has to block that's ok, since that is no different
from the way readahead() works on files, but most of the time it
shouldn't have to and should be able to throw the read in the queue and
return.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/