Re: [PATCH 0/6 v5.1] cfq-iosched: Introduce CFQ group hierarchicalscheduling and "use_hierarchy" interface

From: Gui Jianfeng
Date: Thu Feb 24 2011 - 20:55:59 EST


Vivek Goyal wrote:
> On Wed, Feb 23, 2011 at 11:01:35AM +0800, Gui Jianfeng wrote:
>> Hi
>>
>> I rebase this series on top of *for-next* branch, it will make merging life easier.
>>
>> Previously, I posted a patchset to add support of CFQ group hierarchical scheduling
>> in the way that it puts all CFQ queues in a hidden group and schedules with other
>> CFQ group under their parent. The patchset is available here,
>> http://lkml.org/lkml/2010/8/30/30
>
> Gui,
>
> I was running some tests (iostest) with these patches and my system crashed
> after a while.
>
> To be precise I was running "brrmmap" test of iostest.

Vivek,

I simply run iostest with brrmmap mode, I can't reproduce this bug.
Would you give more details.
Can you tell me the iostest command line options?
Did you enable use_hierarchy in root group?

Thanks,
Gui

>
> train.lab.bos.redhat.com login: [72194.404201] EXT4-fs (dm-1): mounted
> filesystem with ordered data mode. Opts: (null)
> [72642.818976] EXT4-fs (dm-1): mounted filesystem with ordered data mode.
> Opts: (null)
> [72931.409460] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000010
> [72931.410216] IP: [<ffffffff812265ff>] __rb_rotate_left+0xb/0x64
> [72931.410216] PGD 134d80067 PUD 12f524067 PMD 0
> [72931.410216] Oops: 0000 [#1] SMP
> [72931.410216] last sysfs file:
> /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
> [72931.410216] CPU 3
> [72931.410216] Modules linked in: kvm_intel kvm qla2xxx scsi_transport_fc
> [last unloaded: scsi_wait_scan]
> [72931.410216]
> [72931.410216] Pid: 18675, comm: sh Not tainted 2.6.38-rc4+ #3 0A98h/HP
> xw8600 Workstation
> [72931.410216] RIP: 0010:[<ffffffff812265ff>] [<ffffffff812265ff>]
> __rb_rotate_left+0xb/0x64
> [72931.410216] RSP: 0000:ffff88012f461480 EFLAGS: 00010086
> [72931.410216] RAX: 0000000000000000 RBX: ffff880135f40c00 RCX:
> ffffffffffffdcc8
> [72931.410216] RDX: ffff880135f43800 RSI: ffff880135f43000 RDI:
> ffff880135f42c00
> [72931.410216] RBP: ffff88012f461480 R08: ffff880135f40c00 R09:
> ffff880135f43018
> [72931.410216] R10: 0000000000000000 R11: 0000001000000000 R12:
> ffff880135f42c00
> [72931.410216] R13: ffff880135f41808 R14: ffff880135f43000 R15:
> ffff880135f40c00
> [72931.410216] FS: 0000000000000000(0000) GS:ffff8800bfcc0000(0000)
> knlGS:0000000000000000
> [72931.410216] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [72931.410216] CR2: 0000000000000010 CR3: 000000013774f000 CR4:
> 00000000000006e0
> [72931.410216] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [72931.410216] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [72931.410216] Process sh (pid: 18675, threadinfo ffff88012f460000, task
> ffff8801376e6f90)
> [72931.410216] Stack:
> [72931.410216] ffff88012f4614b8 ffffffff81226778 ffff880135f43000
> ffff880135f43000
> [72931.410216] ffff88011c5bed00 0000000000000000 0000000000000001
> ffff88012f4614d8
> [72931.410216] ffffffff8121c521 0000001000000000 ffff880135f41800
> ffff88012f461528
> [72931.410216] Call Trace:
> [72931.410216] [<ffffffff81226778>] rb_insert_color+0xbc/0xe5
> [72931.410216] [<ffffffff8121c521>]
> __cfq_entity_service_tree_add+0x76/0xa5
> [72931.410216] [<ffffffff8121cb28>] cfq_service_tree_add+0x383/0x3eb
> [72931.410216] [<ffffffff8121cbaa>] cfq_resort_rr_list+0x1a/0x2a
> [72931.410216] [<ffffffff8121eb06>] cfq_add_rq_rb+0xbd/0xff
> [72931.410216] [<ffffffff8121ec0a>] cfq_insert_request+0xc2/0x556
> [72931.410216] [<ffffffff8120a44c>] elv_insert+0x118/0x188
> [72931.410216] [<ffffffff8120a52a>] __elv_add_request+0x6e/0x75
> [72931.410216] [<ffffffff812102d0>] __make_request+0x3ac/0x42f
> [72931.410216] [<ffffffff8120e9ca>] generic_make_request+0x2ec/0x356
> [72931.410216] [<ffffffff8120eb05>] submit_bio+0xd1/0xdc
> [72931.410216] [<ffffffff8110bea3>] submit_bh+0xe6/0x108
> [72931.410216] [<ffffffff8110eb9d>] __bread+0x4c/0x6f
> [72931.410216] [<ffffffff811453ab>] ext3_get_branch+0x64/0xdf
> [72931.410216] [<ffffffff81146f5c>] ext3_get_blocks_handle+0x9b/0x90b
> [72931.410216] [<ffffffff81147882>] ext3_get_block+0xb6/0xf6
> [72931.410216] [<ffffffff81113520>] do_mpage_readpage+0x198/0x4bd
> [72931.410216] [<ffffffff810c01b2>] ? __inc_zone_page_state+0x29/0x2b
> [72931.410216] [<ffffffff810ab6e4>] ? add_to_page_cache_locked+0xb6/0x10d
> [72931.410216] [<ffffffff81113980>] mpage_readpages+0xd6/0x123
> [72931.410216] [<ffffffff811477cc>] ? ext3_get_block+0x0/0xf6
> [72931.410216] [<ffffffff811477cc>] ? ext3_get_block+0x0/0xf6
> [72931.410216] [<ffffffff810da750>] ? alloc_pages_current+0xa2/0xc5
> [72931.410216] [<ffffffff81145a6a>] ext3_readpages+0x18/0x1a
> [72931.410216] [<ffffffff810b31fc>] __do_page_cache_readahead+0x111/0x1a7
> [72931.410216] [<ffffffff810b32ae>] ra_submit+0x1c/0x20
> [72931.410216] [<ffffffff810acb1b>] filemap_fault+0x165/0x35b
> [72931.410216] [<ffffffff810c6ce1>] __do_fault+0x50/0x3e2
> [72931.410216] [<ffffffff810c7cf8>] handle_pte_fault+0x2ff/0x779
> [72931.410216] [<ffffffff810b05c9>] ? __free_pages+0x1b/0x24
> [72931.410216] [<ffffffff810c82d1>] handle_mm_fault+0x15f/0x173
> [72931.410216] [<ffffffff815b0963>] do_page_fault+0x348/0x36a
> [72931.410216] [<ffffffff810f21c5>] ? path_put+0x1d/0x21
> [72931.410216] [<ffffffff810f21c5>] ? path_put+0x1d/0x21
> [72931.410216] [<ffffffff815adf1f>] page_fault+0x1f/0x30
> [72931.410216] Code: 48 83 c4 18 44 89 e8 5b 41 5c 41 5d c9 c3 48 83 7b 18
> 00 0f 84 71 ff ff ff e9 77 ff ff ff 90 90 48 8b 47 08 55 48 8b 17 48 89 e5
> <48> 8b 48 10 48 83 e2 fc 48 85 c9 48 89 4f 08 74 10 4c 8b 40 10
> [72931.410216] RIP [<ffffffff812265ff>] __rb_rotate_left+0xb/0x64
> [72931.410216] RSP <ffff88012f461480>
> [72931.410216] CR2: 0000000000000010
> [72931.410216] ---[ end trace cddc7a4456407f6a ]---
>
> Thanks
> Vivek
>
>> Vivek think this approach isn't so instinct that we should treat CFQ queues
>> and groups at the same level. Here is the new approach for hierarchical
>> scheduling based on Vivek's suggestion. The most big change of CFQ is that
>> it gets rid of cfq_slice_offset logic, and makes use of vdisktime for CFQ
>> queue scheduling just like CFQ group does. But I still give cfqq some jump
>> in vdisktime based on ioprio, thanks for Vivek to point out this. Now CFQ
>> queue and CFQ group use the same scheduling algorithm.
>>
>> "use_hierarchy" interface is now added to switch between hierarchical mode
>> and flat mode. It works as memcg.
>>
>> V4 -> V5 Changes:
>> - Change boosting base to a smaller value.
>> - Rename repostion_time to position_time
>> - Replace duplicated code by calling cfq_scale_slice()
>> - Remove redundant use_hierarchy in cfqd
>> - Fix grp_service_tree comment
>> - Rename init_cfqe() to init_group_cfqe()
>>
>> --
>> V3 -> V4 Changes:
>> - Take io class into account when calculating the boost value.
>> - Refine the vtime boosting logic as Vivek's Suggestion.
>> - Make the calculation of group slice cross all service trees under a group.
>> - Modify Documentation in terms of Vivek's comments.
>>
>> --
>> V2 -> V3 Changes:
>> - Starting from cfqd->grp_service_tree for both hierarchical mode and flat mode
>> - Avoid recursion when allocating cfqg and force dispatch logic
>> - Fix a bug when boosting vdisktime
>> - Adjusting total_weight accordingly when changing weight
>> - Change group slice calculation into a hierarchical way
>> - Keep flat mode rather than deleting it first then adding it later
>> - kfree the parent cfqg if there nobody references to it
>> - Simplify select_queue logic by using some wrap function
>> - Make "use_hierarchy" interface work as memcg
>> - Make use of time_before() for vdisktime compare
>> - Update Document
>> - Fix some code style problems
>>
>> --
>> V1 -> V2 Changes:
>> - Raname "struct io_sched_entity" to "struct cfq_entity" and don't differentiate
>> queue_entity and group_entity, just use cfqe instead.
>> - Give newly added cfqq a small vdisktime jump accord to its ioprio.
>> - Make flat mode as default CFQ group scheduling mode.
>> - Introduce "use_hierarchy" interface.
>> - Update blkio cgroup documents
>>
>> Documentation/cgroups/blkio-controller.txt | 81 +-
>> block/blk-cgroup.c | 61 +
>> block/blk-cgroup.h | 3
>> block/cfq-iosched.c | 959 ++++++++++++++++++++---------
>> 4 files changed, 815 insertions(+), 289 deletions(-)
>>
>> Thanks,
>> Gui
>

--
Regards
Gui Jianfeng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/