Re: do_generic_mapping_read performance issue

From: Ashif Harji
Date: Mon Mar 12 2007 - 13:05:46 EST



On Mon, 12 Mar 2007, Jan Kara wrote:

On Mon 12-03-07 15:39:00, Nick Piggin wrote:
On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
Hi,

Hi, I am encountering a performance problem, which I have tracked into the
Linux kernel. The problem occurs with my experimental web server that uses
sendfile to repeatedly transmit files. The files are based on the static
portion of the SPECweb99 fileset and range in size to model a reasonable
workload. With this workload, a significant number of the requests are
for files of size 4 KB or less.

I have determined that the performance problems occurs in the function
do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
Here is the specific code fragment:

/*
* When (part of) the same page is read multiple times
* in succession, only mark it as accessed the first time.
*/
if (prev_index != index)
mark_page_accessed(page);
Actually, the code is like that certainly for two years :).

Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
stage (ie. read from the start of a page -- obviously that also has holes).
Yes, at least in 2.6.12-rc5 which is the first one in git :).

I was wondering if anyone could explain why the call to mark_page_accessed
is conditional? That is, what problem it is trying to solve. It would seem
that in many scenarios, if the same page is accessed repeatedly, then it
would be appropriate to keep that page cached.
I also don't know why the condition is there but it's there at least
for two years so I'm not sure anybody remembers ;). Nick, do you have
an idea?

Yeah it is there because that is basically how our "use once" detection
handles the case where an app does not read in chunks that are PAGE_SIZE
multiples and PAGE_SIZE aligned.
OK, I see. Then I'm not sure the check does more good than bad. Because
if we happen to reread the same chunk several times, then the check does a
wrong thing...

Thanks for providing me with additional information.

I would like to submit a patch to fix the performance problem. The simplest solution is to remove the check. Even in the situation where an application does not read in PAGE_SIZE multiples as described above, if the page is accessed frequently it should remain in the cache. However, I am open to suggestions for a more sophisticated scheme.

ashif.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/