Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

From: Ni zhan Chen
Date: Wed Oct 24 2012 - 21:48:49 EST


On 10/25/2012 08:17 AM, YingHang Zhu wrote:
On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:
Hi Dave,
On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:
Hi,
Recently we ran into the bug that an opened file's ra_pages does not
synchronize with it's backing device's when the latter is changed
with blockdev --setra, the application needs to reopen the file
to know the change,
or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead
window to the (new) bdi default.

which is inappropriate under our circumstances.
Which are? We don't know your circumstances, so you need to tell us
why you need this and why existing methods of handling such changes
are insufficient...

Optimal readahead windows tend to be a physical property of the
storage and that does not tend to change dynamically. Hence block
device readahead should only need to be set up once, and generally
that can be done before the filesystem is mounted and files are
opened (e.g. via udev rules). Hence you need to explain why you need
to change the default block device readahead on the fly, and why
fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
windows to the new defaults.
Our system is a fuse-based file system, fuse creates a
pseudo backing device for the user space file systems, the default readahead
size is 128KB and it can't fully utilize the backing storage's read ability,
so we should tune it.
Sure, but that doesn't tell me anything about why you can't do this
at mount time before the application opens any files. i.e. you've
simply stated the reason why readahead is tunable, not why you need
to be fully dynamic.....
We store our file system's data on different disks so we need to change ra_pages
dynamically according to where the data resides, it can't be fixed at mount time
or when we open files.
The abstract bdi of fuse and btrfs provides some dynamically changing
bdi.ra_pages
based on the real backing device. IMHO this should not be ignored.

And how to tune ra_pages if one big file distribution in different disks, I think Fengguang Wu can answer these questions,

Hi Fengguang,

The above third-party application using our file system maintains
some long-opened files, we does not have any chances
to force them to call fadvise(POSIX_FADV_NORMAL). :(
So raise a bug/feature request with the third party. Modifying
kernel code because you can't directly modify the application isn't
the best solution for anyone. This really is an application problem
- the kernel already provides the mechanisms to solve this
problem... :/
Thanks for advice, I will consult the above application's developers
for more information.
Now from the code itself should we merge the gap between the real
device's ra_pages and the file's?
Obviously the ra_pages is duplicated, otherwise each time we run into this
problem, someone will do the same work as I have done here.

Thanks,
Ying Zhu
Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/