Re: [RFC PATCH 0/2] dirreadahead system call

From: Andreas Dilger
Date: Mon Nov 10 2014 - 17:23:14 EST


On Nov 9, 2014, at 8:41 PM, Abhijith Das <adas@xxxxxxxxxx> wrote:
>> Hi Dave/all,
>>
>> I finally got around to playing with the multithreaded userspace readahead
>> idea and the results are quite promising. I tried to mimic what my kernel
>> readahead patch did with this userspace program (userspace_ra.c)
>> Source code here:
>> https://www.dropbox.com/s/am9q26ndoiw1cdr/userspace_ra.c?dl=0
>>
>> Each thread has an associated buffer into which a chunk of directory
>> entries are read in using getdents(). Each thread then sorts the
>> entries in inode number order (for GFS2, this is also their disk
>> block order) and proceeds to cache in the inodes in that order by
>> issuing open(2) syscalls against them. In my tests, I backgrounded
>> this program and issued an 'ls -l' on the dir in question. I did the
>> same following the kernel dirreadahead syscall as well.
>>
>> I did not manage to test out too many parameter combinations for both
>> userspace_ra and SYS_dirreadahead because the test matrix got pretty
>> big and time consuming. However, I did notice that without sorting,
>> userspace_ra did not perform as well in some of my tests. I haven't
>> investigated that, so numbers shown here are all with sorting enabled.

One concern is for filesystems where inode order does not necessarily
match the on-disk order. I believe that filesystems like ext4 and XFS
have matching inode/disk order, but tree-based COW filesystems like
Btrfs do not necessarily preserve this order, so sorting in userspace
will not help and may in fact hurt readahead compared to readdir order.

What filesystem(s) have you tested this besides GFS?

Cheers, Andreas

>> For a directory with 100000 files,
>> a) simple 'ls -l' took 14m11s
>> b) SYS_dirreadahead + 'ls -l' took 3m9s, and
>> c) userspace_ra (1M buffer/thread, 32 threads) took 1m42s
>>
>> https://www.dropbox.com/s/85na3hmo3qrtib1/ra_vs_u_ra_vs_ls.jpg?dl=0 is a
>> graph
>> that contains a few more data points. In the graph, along with data for 'ls
>> -l'
>> and SYS_dirreadahead, there are six data series for userspace_ra for each
>> directory size (10K, 100K and 200K files). i.e. u_ra:XXX,YYY, where XXX is
>> one
>> of (64K, 1M) buffer size and YYY is one of (4, 16, 32) threads.
>>
>
> Hi,
>
> Here are some more numbers for larger directories and it seems like
> userspace readahead scales well and is still a good option.
>
> I've chosen the best-performing runs for kernel readahead and userspace
> readahead. I have data for runs with different parameters (buffer size,
> number of threads, etc) that I can provide, if anybody's interested.
>
> The numbers here are total elapsed times for the readahead plus 'ls -l'
> operations to complete.
>
> #files in testdir
> 50k 100k 200k 500k 1m
> ------------------------------------------------------------------------------------
> Readdir 'ls -l' 11 849 1873 5024 10365
> Kernel readahead + 'ls -l' (best case) 7 214 814 2330 4900
> Userspace MT readahead + 'ls -l' (best case) 12 99 239 1351 4761
>
> Cheers!
> --Abhi


Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail