factor in less elapsed time and better worst-case (successful
and unsuccessful) hash performance when using a larger table.
surprisingly, CPU cost is only part of the picture, it seems.
i also tested adding the raw offset in the page hash function, and across
the board i still see a measurable performance drop.
> Hmm. This looks like another place where dropping the kernel lock
> during the copy would be beneficial: we already hold the mm semaphore at
> the time, so we're not vulnerable to too many races. I'll look at this.
let me be the first to encourage you to do this! :)
> >> Shrinking the dcaches excessively in this case will simply masaccre the
> >> performance.
>
> > actually, that's not strictly true. shrinking the dcache early will
> > improve the lookup efficiency of the hash, i've found almost by two
> > times.
>
> Sure, but a glibc build is referencing a _lot_ of header files! My
> concern is that the vmscan loop currently invokes a prune_dcache(0),
> which is as aggressive as you can get. If we do that any more
> frequently, getting a good balance of the dcache will be a lot harder.
andrea's arca10 replaces prune_dcache(0) with something a little more
easy-going:
prune_dcache(dentry_stat.nr_unused / (priority+1));
however, having a good dentry replacement policy might be even better.
> FWIW, the profile with the new hash functions but small dcache started
> like this (__find_page and find_buffer have been taken out of inline for
> profiling here):
>
> 4893 d_lookup 23.5240
> 2741 do_anonymous_page 21.4141
> 1486 file_read_actor 18.5750
> 1475 do_wp_page 2.6721
> 1218 __get_free_pages 2.5805
> 1075 __find_page 15.8088
> 844 filemap_nopage 1.1405
> 684 brw_page 0.7403
> 600 lookup_dentry 1.2295
> 594 find_buffer 6.4565
> 567 page_fault 47.2500
> 564 handle_mm_fault 1.2261
> 523 __free_page 2.2543
> 439 free_pages 1.6140
> 420 do_con_write 0.2471
> 403 strlen_user 8.3958
> 391 zap_page_range 0.8806
> 382 do_page_fault 0.4799
>
> and with the larger dcache,
>
> 2434 do_anonymous_page 19.0156
> 1451 do_wp_page 2.6286
> 1343 file_read_actor 16.7875
> 1328 __find_page 19.5294
> 1149 __get_free_pages 2.4343
> 1112 d_lookup 5.3462
> 847 find_buffer 9.2065
> 847 filemap_nopage 1.1446
> 628 brw_page 0.6797
> 580 page_fault 48.3333
> 577 lookup_dentry 1.1824
> 563 handle_mm_fault 1.2239
> 543 __free_page 2.3405
> 414 do_con_write 0.2435
> 397 free_pages 1.4596
> 377 system_call 6.7321
> 356 strlen_user 7.4167
> 354 zap_page_range 0.7973
> 319 do_page_fault 0.4008
>
> Interestingly, do_anonymous_page, do_wp_page and file_read_actor are all
> places where we can probably optimise things to drop the kernel lock.
> That won't make them run faster but on SMP it will certainly let other
> CPUs get more kernel work done. Film at 11.
the normalized value for page_fault is still pretty high: +48. is there
anything that can be done about that, or is that not a concern?
also i tried benchmarking a stock 2.2.5 kernel with a 12 bit inode hash,
and found performance gains as significant as the other gains you found.
- Chuck Lever
-- corporate: <chuckl@netscape.com> personal: <chucklever@netscape.net> or <cel@monkey.org>The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/