[no subject]

From: Wu Fengguang
Date: Wed Dec 30 2009 - 00:41:32 EST


Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
CC: Steven Whitehouse <swhiteho@xxxxxxxxxx>
Subject: Re: [RFC][PATCH] Disabling read-ahead makes I/O of large reads
small
Reply-To:
In-Reply-To: <87aax18xms.fsf@xxxxxxxxxxxxxxxxx>

Hi Quentin,

Quentin Barnes <qbarnes+nfs@xxxxxxxxxxxxx> writes:

> Adding the posix_fadvise(..., POSIX_FADV_RANDOM) call sets

Have you tried w/o POSIX_FADV_RANDOM (ie. comment out the fadivse call)?
It should be able to achieve the same good performance. The heuristic
readahead logic should detect that the application is doing random
reads.

> ra_pages=0. This has a very odd side-effect in the kernel. Once
> read-ahead is disabled, subsequent calls to read(2) are now done in
> the kernel via ->readpage() callback doing I/O one page at a time!
>
> Pouring through the code in mm/filemap.c I see that the kernel has
> commingled read-ahead and plain read implementations. The algorithms
> have much in common, so I can see why it was done, but it left this
> anomaly of severely pimping read(2) calls on file descriptors with
> read-ahead disabled.
>
>
> For example, with a read(2) of 98K bytes of a file opened with
> O_DIRECT accessed over NFSv3 with rsize=32768, I see:
> =========
> V3 ACCESS Call (Reply In 249), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 248)
> V3 READ Call (Reply In 321), FH:0xf3a8e519 Offset:0 Len:32768
> V3 READ Call (Reply In 287), FH:0xf3a8e519 Offset:32768 Len:32768
> V3 READ Call (Reply In 356), FH:0xf3a8e519 Offset:65536 Len:32768
> V3 READ Reply (Call In 251) Len:32768
> V3 READ Reply (Call In 250) Len:32768
> V3 READ Reply (Call In 252) Len:32768
> =========
> I would expect three READs issued of size 32K, and that's exactly
> what I see.
>
>
> For the same file without O_DIRECT but with read-ahead disabled
> (its ra_pages=0), I see:
> =========
> V3 ACCESS Call (Reply In 167), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 166)
> V3 READ Call (Reply In 172), FH:0xf3a8e519 Offset:0 Len:4096
> V3 READ Reply (Call In 168) Len:4096
> V3 READ Call (Reply In 177), FH:0xf3a8e519 Offset:4096 Len:4096
> V3 READ Reply (Call In 173) Len:4096
> V3 READ Call (Reply In 182), FH:0xf3a8e519 Offset:8192 Len:4096
> V3 READ Reply (Call In 178) Len:4096
> [... READ Call/Reply pairs repeated another 21 times ...]
> =========
> Now I see 24 READ calls of 4K each!

Good catch, Thank you very much!

> A workaround for this kernel problem is to hack the app to do a
> readahead(2) call prior to the read(2), however, I would think a
> better approach would be to fix the kernel. I came up with the
> included patch that once applied restores the expected read(2)
> behavior. For the latter test case above of a file with read-ahead
> disabled but now with the patch below applied, I now see:
> =========
> V3 ACCESS Call (Reply In 1350), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 1349)
> V3 READ Call (Reply In 1387), FH:0xf3a8e519 Offset:0 Len:32768
> V3 READ Call (Reply In 1421), FH:0xf3a8e519 Offset:32768 Len:32768
> V3 READ Call (Reply In 1456), FH:0xf3a8e519 Offset:65536 Len:32768
> V3 READ Reply (Call In 1351) Len:32768
> V3 READ Reply (Call In 1352) Len:32768
> V3 READ Reply (Call In 1353) Len:32768
> =========
> Which is what I would expect -- back to just three 32K READs.
>
> After this change, the overall performance of the application
> increased by 313%!

And awesome improvements!

>
> I have no idea if my patch is the appropriate fix. I'm well out of
> my area in this part of the kernel. It solves this one problem, but
> I have no idea how many boundary cases it doesn't cover or even if
> it is the right way to go about addressing this issue.
>
> Is this behavior of shorting I/O of read(2) considered a bug? And
> is this approach for a fix approriate?

The approach is mostly OK for the bug. However one issue is missed --
the ra_pages is somehow overloaded. I try to fix the problems in the
two patches just posted. Will that solve your problem?

Thanks,
Fengguang

>
> --- linux-2.6.32.2/mm/filemap.c 2009-12-18 16:27:07.000000000 -0600
> +++ linux-2.6.32.2-rapatch/mm/filemap.c 2009-12-24 13:07:07.000000000 -0600
> @@ -1012,9 +1012,13 @@ static void do_generic_file_read(struct
> find_page:
> page = find_get_page(mapping, index);
> if (!page) {
> - page_cache_sync_readahead(mapping,
> - ra, filp,
> - index, last_index - index);
> + if (ra->ra_pages)
> + page_cache_sync_readahead(mapping,
> + ra, filp,
> + index, last_index - index);
> + else
> + force_page_cache_readahead(mapping, filp,
> + index, last_index - index);
> page = find_get_page(mapping, index);
> if (unlikely(page == NULL))
> goto no_cached_page;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/