Re: [RFC 2/2] x86_64: expand kernel stack to 16K

From: Dave Chinner
Date: Wed May 28 2014 - 18:31:51 EST


On Wed, May 28, 2014 at 09:09:23AM -0700, Linus Torvalds wrote:
> On Tue, May 27, 2014 at 11:53 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
> >
> > So, my stupid idea is just let's expand stack size and keep an eye
> > toward stack consumption on each kernel functions via stacktrace of ftrace.
.....
> But what *does* stand out (once again) is that we probably shouldn't
> do swap-out in direct reclaim. This came up the last time we had stack
> issues (XFS) too. I really do suspect that direct reclaim should only
> do the kind of reclaim that does not need any IO at all.
>
> I think we _do_ generally avoid IO in direct reclaim, but swap is
> special. And not for a good reason, afaik. DaveC, remind me, I think
> you said something about the swap case the last time this came up..

Right, we do generally avoid IO through filesystems via direct
reclaim because delayed allocation requires significant amounts
of additional memory, stack space and IO.

However, swap doesn't have that overhead - it's just the IO stack
that it drives through submit_bio(), and the worst case I'd seen
through that path was much less than other reclaim stack path usage.
I haven't seen swap in any of the stack overflows from production
machines, and I only rarely see it in worst case stack usage
profiles on my test machines.

Indeed, the call chain reported here is not caused by swap issuing
IO. We scheduled in the swap code (throttling waiting for
congestion, I think) with a plugged block device (from the ext4
writeback layer) with pending bios queued on it and the scheduler
has triggered a flush of the device. submit_bio in the swap path
has much less stack usage than io_schedule() because it doesn't have
any of the scheduler or plug list flushing overhead in the stack.

So, realistically, the swap path is not worst case stack usage here
and disabling it won't prevent this stack overflow from happening.
Direct reclaim will simply throttle elsewhere and that will still
cause the plug to be flushed, the IO to be issued and the stack to
overflow.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/