RE: Direct io on block device has performance regression on 2.6.x kernel

From: Chen, Kenneth W
Date: Wed Mar 09 2005 - 17:20:36 EST


Andrew Morton wrote on Wednesday, March 09, 2005 12:05 PM
> "Chen, Kenneth W" <kenneth.w.chen@xxxxxxxxx> wrote:
> > Let me answer the questions in reverse order. We started with running
> > industry standard transaction processing database benchmark on 2.6 kernel,
> > on real hardware (4P smp, 64 GB memory, 450 disks) running industry
> > standard db application. What we measured is that with best tuning done
> > to the system, 2.6 kernel has a huge performance regression relative to
> > its predecessor 2.4 kernel (a kernel from RHEL3, 2.4.21 based).
>
> That's news to me. I thought we were doing OK with big database stuff.
> Surely lots of people have been testing such things.

There are different level of "big" stuff. We used to work on 32-way numa
box, but other show stopper issues popping up before we get to the I/O stack.
The good thing came out of that work is the removal of global unplug lock.


> > And yes, it is all worth pursuing, the two patches on raw device recuperate
> > 1/3 of the total benchmark performance regression.
>
> On a real disk driver? hm, I'm wrong then.
>

Yes, on real disk driver (qlogic fiber channel) and with real 15K rpm disks.


> Did you generate a kernel profile?

Top 40 kernel hot functions, percentage is normalized to kernel utilization.

_spin_unlock_irqrestore 23.54%
_spin_unlock_irq 19.27%
__blockdev_direct_IO 3.57%
follow_hugetlb_page 1.84%
e1000_clean 1.38%
kmem_cache_alloc 1.31%
put_page 1.29%
__generic_file_aio_read 1.18%
e1000_intr 1.07%
schedule 1.01%
dio_bio_complete 0.97%
mempool_alloc 0.96%
kmem_cache_free 0.90%
__end_that_request_first 0.88%
__copy_user 0.82%
kfree 0.77%
generic_make_request 0.73%
_spin_lock 0.73%
kref_put 0.73%
vfs_read 0.68%
update_atime 0.68%
scsi_dispatch_cmd 0.67%
fget_light 0.66%
put_io_context 0.60%
_spin_lock_irqsave 0.58%
scsi_finish_command 0.58%
generic_file_aio_write_nolock 0.57%
inode_times_differ 0.55%
break_fault 0.53%
__do_softirq 0.48%
aio_read_evt 0.48%
try_atomic_semop 0.44%
sys_pread64 0.43%
__bio_add_page 0.43%
__mod_timer 0.42%
bio_alloc 0.41%
scsi_decide_disposition 0.40%
e1000_clean_rx_irq 0.39%
find_vma 0.38%
dnotify_parent 0.38%


Profile with spin lock inlined, so that it is easier to see functions
that has the lock contention, again top 40 hot functions:

scsi_request_fn 7.54%
finish_task_switch 6.25%
__blockdev_direct_IO 4.97%
__make_request 3.87%
scsi_end_request 3.54%
dio_bio_end_io 2.70%
follow_hugetlb_page 2.39%
__wake_up 2.37%
aio_complete 1.82%
kmem_cache_alloc 1.68%
__mod_timer 1.63%
e1000_clean 1.57%
__generic_file_aio_read 1.42%
mempool_alloc 1.37%
put_page 1.35%
e1000_intr 1.31%
schedule 1.25%
dio_bio_complete 1.20%
scsi_device_unbusy 1.07%
kmem_cache_free 1.06%
__copy_user 1.04%
scsi_dispatch_cmd 1.04%
__end_that_request_first1.04%
generic_make_request 1.02%
kfree 0.94%
__aio_get_req 0.93%
sys_pread64 0.83%
get_request 0.79%
put_io_context 0.76%
dnotify_parent 0.73%
vfs_read 0.73%
update_atime 0.73%
finished_one_bio 0.63%
generic_file_aio_write_nolock 0.63%
scsi_put_command 0.62%
break_fault 0.62%
e1000_xmit_frame 0.62%
aio_read_evt 0.59%
scsi_io_completion 0.59%
inode_times_differ 0.58%



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/