Re: Lockless page cache test results

From: Jens Axboe
Date: Wed Apr 26 2006 - 15:15:07 EST


On Wed, Apr 26 2006, Linus Torvalds wrote:
>
>
> On Wed, 26 Apr 2006, Andrew Morton wrote:
>
> > Jens Axboe <axboe@xxxxxxx> wrote:
> > >
> > > Once per page, it's basically exercising the generic_file_splice_read()
> > > path. Basically X number of "clients" open the same file, and fill those
> > > pages into a pipe using splice. The output end of the pipe is then
> > > spliced to /dev/null to toss it away again.
> >
> > OK. That doesn't sound like something which a real application is likely
> > to do ;)
>
> True, but on the other hand, it does kind of "distill" one (small) part of
> something that real apps _are_ likely to do.
>
> The whole 'splice to /dev/null' part can be seen as totally irrelevant,
> but at the same time a way to ignore all the other parts of normal page
> cache usage (ie the other parts of page cache usage tend to be the "map it
> into user space" or the actual "memcpy_to/from_user()" or the "TCP send"
> part).
>
> The question, of course, is whether the part that remains (the actual page
> lookup) is important enough to matter, once it is part of a bigger chain
> in a real application.
>
> In other words, the splice() thing is just a way to isolate one part of a
> chain that is usually much more involved, and micro-benchmark just that
> one part.

Nick called it a find_get_page() micro benchmark, which is pretty might
spot on. So naturally it shows the absolute best side of the lockless
page cache, but that is also very interesting. The /dev/null output can
just be seen as a "infinitely" fast output method, both from a
throughput and light weight POV.

> It would be interesting to see where doing gang-lookup moves the target,
> but on the other hand, with smaller files (and small files are still
> common), gang lookup isn't going to help as much.

With a 16-page gang lookup in splice, the top profile for the 4-client
case (which is now at 4GiB/sec instead of 3) are:

samples % symbol name
30396 36.7217 __do_page_cache_readahead
25843 31.2212 find_get_pages_contig
9699 11.7174 default_idle

Even disregarding that readahead contender that could probably be made a
little more clever, we are still spending an awful lot of time in the
page lookup. I didn't mention this before, but the get_page/put_page
overhead is also a lot smaller with the lockless patches.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/