Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

From: Jan Kara
Date: Thu May 14 2015 - 08:46:59 EST

On Thu 14-05-15 21:23:04, Dave Chinner wrote:
> On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote:
> > And readdir() itself, for that matter - we have no good vfs-level
> > readdir caching, so it all ends up serialized on the inode
> > semaphore, and it all goes all the way into the filesystem to get
> > the readdir data. And at least for ext4, readdir()
> > is slow anyway, because it doesn't use the page cache, it uses
> > that good old buffer cache, because of how ext4 does metadata
> > journaling etc.
> IIRC, ext4 readdir is not slow because of the use of the buffer
> cache, it's slow because of the way it hashes dirents across blocks
> on disk. i.e. it has locality issues, not a caching problem.
For ext4 readdir is just a linear read of the directory. Linus is right
we store directory blocks in buffer cache but we do our own readahead on
directory blocks so I don't think much slowness comes from that. One
thing that is slowing us down is that we don't do preallocation for
directories so they often end up being fragmented a lot.

The locality problem you are probably referring to is that readdir on ext4
returns directory entries in hash order. That is different from the
ordering by inode number which is optimal for the following cache-cold stat
/ unlink / whatever you want to do with inodes. This causes big performance
issues e.g. if you do rm -rf on large directory hierarchy. But you don't
see that often these days as lots of utilities have learned to workaround
ext4 problems by sorting directory entries by inode number before doing
anything with them.

Jan Kara <jack@xxxxxxx>
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at