Re: AIM7 40% regression with 2.6.26-rc1

From: Zhang, Yanmin
Date: Tue May 06 2008 - 23:43:22 EST



On Wed, 2008-05-07 at 10:11 +0800, Zhang, Yanmin wrote:
> On Tue, 2008-05-06 at 13:44 +0200, Ingo Molnar wrote:
> > * Zhang, Yanmin <yanmin_zhang@xxxxxxxxxxxxxxx> wrote:
> >
> > > Comparing with kernel 2.6.25, ïAIM7 (use tmpfs) has ïmore than 40% with
> > > 2.6.26-rc1 on my 8-core stoakley, 16-core tigerton, and Itanium
> > > Montecito. Bisect located below patch.
> > >
> > > 64ac24e738823161693bf791f87adc802cf529ff is first bad commit
> > > commit 64ac24e738823161693bf791f87adc802cf529ff
> > > Author: Matthew Wilcox <matthew@xxxxxx>
> > > Date: Fri Mar 7 21:55:58 2008 -0500
> > >
> > > Generic semaphore implementation
> > >
> > > After I manually reverted the patch against 2.6.26-rc1 while fixing
> > > lots of conflictions/errors, aim7 regression became less than 2%.
> >
> > hm, which exact semaphore would that be due to?
> >
> > My first blind guess would be the BKL - there's not much other semaphore
> > use left in the core kernel otherwise that would affect AIM7 normally.
> > The VFS still makes frequent use of the BKL and AIM7 is very VFS
> > intense. Getting rid of that BKL use from the VFS might be useful to
> > performance anyway.
> >
> > Could you try to check that it's indeed the BKL?
> >
> > Easiest way to check it would be to run AIM7 it on
> > sched-devel.git/latest and do scheduler tracing via:
> >
> > http://people.redhat.com/mingo/sched-devel.git/readme-tracer.txt
> One clear weird behavior of aim7 is cpu idle is 0% with 2.6.25, but is more than 50% with
> 2.6.26-rc1. I have a patch to collect schedule info.
With my patch+gprof, I collected some data. Below was outputed by gprof.

index % time self children called name
0.00 0.00 2/223305376 __down_write_nested [22749]
0.00 0.00 3/223305376 journal_commit_transaction [10526]
0.00 0.00 6/223305376 __down_read [22745]
0.00 0.00 8/223305376 start_this_handle [19167]
0.00 0.00 15/223305376 sys_pause [19808]
0.00 0.00 17/223305376 log_wait_commit [11047]
0.00 0.00 20/223305376 futex_wait [8122]
0.00 0.00 64/223305376 pdflush [14335]
0.00 0.00 71/223305376 do_get_write_access [5367]
0.00 0.00 84/223305376 pipe_wait [14460]
0.00 0.00 111/223305376 kjournald [10726]
0.00 0.00 116/223305376 int_careful [9634]
0.00 0.00 224/223305376 do_nanosleep [5418]
0.00 0.00 1152/223305376 watchdog [22065]
0.00 0.00 4087/223305376 worker_thread [22076]
0.00 0.00 5003/223305376 __mutex_lock_killable_slowpath [23305]
0.00 0.00 7810/223305376 ksoftirqd [10831]
0.00 0.00 9389/223305376 __mutex_lock_slowpath [23306]
0.00 0.00 10642/223305376 io_schedule [9813]
0.00 0.00 23544/223305376 migration_thread [11495]
0.00 0.00 35319/223305376 __cond_resched [22673]
0.00 0.00 49065/223305376 retint_careful [16146]
0.00 0.00 119757/223305376 sysret_careful [20074]
0.00 0.00 151717/223305376 do_wait [5545]
0.00 0.00 250221/223305376 do_exit [5356]
0.00 0.00 303836/223305376 cpu_idle [4350]
0.00 0.00 222333093/223305376 schedule_timeout [2]
[1] 0.0 0.00 0.00 223305376 schedule [1]
-----------------------------------------------
0.00 0.00 2/222333093 io_schedule_timeout [9814]
0.00 0.00 4/222333093 journal_stop [10588]
0.00 0.00 8/222333093 cifs_oplock_thread [3760]
0.00 0.00 14/222333093 do_sys_poll [5513]
0.00 0.00 20/222333093 cifs_dnotify_thread [3733]
0.00 0.00 32/222333093 read_chan [15648]
0.00 0.00 47/222333093 wait_for_common [22017]
0.00 0.00 658/222333093 do_select [5479]
0.00 0.00 2000/222333093 inet_stream_connect [9324]
0.00 0.00 222330308/222333093 __down [22577]
[2] 0.0 0.00 0.00 222333093 schedule_timeout [2]
0.00 0.00 222333093/223305376 schedule [1]
-----------------------------------------------
0.00 0.00 1/165565 flock_lock_file_wait [7735]
0.00 0.00 7/165565 __posix_lock_file [23371]
0.00 0.00 203/165565 de_put [4665]
0.00 0.00 243/165565 opost [13633]
0.00 0.00 333/165565 proc_root_readdir [14982]
0.00 0.00 358/165565 write_chan [22090]
0.00 0.00 6222/165565 proc_lookup_de [14908]
0.00 0.00 32081/165565 sys_fcntl [19687]
0.00 0.00 36045/165565 vfs_ioctl [21822]
0.00 0.00 42025/165565 tty_release [20818]
0.00 0.00 48047/165565 chrdev_open [3702]
[3] 0.0 0.00 0.00 165565 lock_kernel [3]
0.00 0.00 152987/153190 down [4]
-----------------------------------------------
0.00 0.00 203/153190 __reacquire_kernel_lock [23420]
0.00 0.00 152987/153190 lock_kernel [3]
[4] 0.0 0.00 0.00 153190 down [4]
0.00 0.00 153190/153190 __down [22577]
-----------------------------------------------
0.00 0.00 153190/153190 down [4]
[22577 0.0 0.00 0.00 153190 __down [22577]
0.00 0.00 222330308/222333093 schedule_timeout [2]


As system idle is more than 50%, so the schedule/schedule_timeout caller is important
information.
1) lock_kernel causes most schedule/schedule_timeout;
2) When lock_kernel calls down, then __down, __down calls ïschedule_timeout for
lots of times in a loop;
3) Caller of lcok_kernel are sys_fcntl/vfs_ioctl/tty_release/chrdev_open.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/