Re: ext4 deep stack with mark_page_dirty reclaim

From: Andreas Dilger
Date: Tue Mar 15 2011 - 01:17:49 EST


On 2011-03-14, at 1:46 PM, Ted Ts'o wrote:
> On Mon, Mar 14, 2011 at 12:20:52PM -0700, Hugh Dickins wrote:
>> When testing something else on 2.6.38-rc8 last night,
>> I hit this x86_64 stack overflow. I've never had one before,
>> it seems worth reporting. kdb was in, I jotted it down by hand
>> (the notifier part of it will be notifying kdb of the fault).
>> CONFIG_DEBUG_STACK_OVERFLOW and DEBUG_STACK_USAGE were not set.
>>
>> I should disclose that I have a hack in which may make my stack
>> frames slightly larger than they should be: check against yours.
>> So it may not be an overflow for anyone else, but still a trace
>> to worry about.
>
> Here's the trace translated to the stack space used by each function.
> There are a few piggy ext4 functions that we can try to shrink, but
> the real problem is just how deep the whole stack is getting.
>
> From the syscall to the lowest-level ext4 function is 3712 bytes, and
> everything from there to the schedule() which then triggered the GPF
> was another 3728 of stack space....

Is there a script which you used to generate this stack trace to function size mapping, or did you do it by hand? I've always wanted such a script, but the tricky part is that there is so much garbage on the stack that any automated stack parsing is almost useless. Alternately, it would seem trivial to have the stack dumper print the relative address of each symbol, and the delta from the previous symbol...

To be honest, I think the stack size limitation is becoming a serious problem in itself. While some stack-size reduction effort is actually useful in removing inefficiency, I think there is a lot of crazy and inefficient things to try and minimize the stack usage (e.g. lots of kmalloc/kfree of temporary arrays instead of just putting them on the stack), which ends up consuming _more_ total memory.

This can be seen with deep storage stacks that are using the network on both ends, like NFS+{XFS, ext4}+LVM+DM+{fcoib,iSCSI}+driver+kmalloc or similar... The below stack isn't even using something so convoluted.

> 240 schedule+0x25a
> 368 io_schedule+0x35
> 32 get_request_wait+0xc6
> 160 __make_request+0x36d
> 112 generic_make_request+0x2f2
> 208 submit_bio+0xe1
> 144 swap_writepage+0xa3
> 80 pageout+0x151
> 128 shrink_page_list+0x2db
> 176 shrink_inactive_list+0x2d3
> 256 shrink_zone+0x17d
> 224 shrink_zones+0x0xa3
> 128 do_try_to_free_pages+0x87
> 144 try_to_free_mem_cgroup_pages+0x8e
> 112 mem_cgroup_hierarchical_reclaim+0x220
> 176 mem_cgroup_do_charge+0xdc
> 128 __mem_cgroup_try_charge+0x19c
> 128 mem_cgroup_charge_common+0xa8
> 128 mem_cgroup_cache_charge+0x19a
> 128 add_to_page_cache_locked+0x57
> 96 add_to_page_cache_lru+0x3e
> 80 find_or_create_page+0x69
> 112 grow_dev_page+0x4a
> 96 grow_buffers+0x41
> 64 __getblk_slow+0xd7
> 80 __getblk+0x44
> 80 __ext4_get_inode_loc+0x12c
> 176 ext4_get_inode_loc+0x30
> 48 ext4_reserve_inode_write+0x21
> 80 ext4_mark_inode_dirty+0x3b
> 160 ext4_dirty_inode+0x3e
> 64 __mark_inode_dirty+0x32
> 80 linux/fs.h mark_inode_dirty
> 0 linux/quotaops.h dquot_alloc_space
> 0 linux/quotaops.h dquot_alloc_block
> 0 ext4_mb_new_blocks+0xc2
> 144 ext4_alloc_blocks+0x189
> 208 ext4_alloc_branch+0x73
> 208 ext4_ind_map_blocks+0x148
> 272 ext4_map_blocks+0x148
> 112 ext4_getblk+0x5f
> 144 ext4_bread+0x36
> 96 ext4_append+0x52
> 96 do_split+0x5b
> 224 ext4_dx_add_entry+0x4b4
> 304 ext4_add_entry+0x7c
> 176 ext4_add_nondir+0x2e
> 80 ext4_create+0xf5
> 144 vfs_create+0x83
> 96 __open_namei_create+0x59
> 96 do_last+0x13b
> 112 do_filp_open+0x2ae
> 384 do_sys_open+0x72
> 128 sys_open+0x27
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/