Re: Large stack usage in fs code (especially for PPC64)

From: Benjamin Herrenschmidt
Date: Mon Nov 17 2008 - 18:31:36 EST


On Mon, 2008-11-17 at 15:34 -0500, Steven Rostedt wrote:
>
> I've been hitting stack overflows on a PPC64 box, so I ran the ftrace
> stack_tracer and part of the problem with that box is that it can nest
> interrupts too deep. But what also worries me is that there's some heavy
> hitters of stacks in generic code. Namely the fs directory has some.

Note that we shouldn't stack interrupts much in practice. The PIC will
not let same or lower prio interrupts in until we have completed one.
However timer/decrementer is not going through the PIC, so I think what
happens is we get a hw IRQ, on the way back, just before returning from
do_IRQ (so we have completed the IRQ from the PIC standpoint), we go
into soft-irq's, at which point deep inside SCSI we get another HW IRQ
and we stack a decrementer interrupt on top of it.

Now, we should do stack switching for both HW IRQs and softirqs with
CONFIG_IRQSTACKS, which should significantly alleviate the problem.

Your second trace also shows how horrible the stack traces can be when
the device-model kicks in, ie, register->probe->register sub device ->
etc... that isnt going to be nice on x86 with 4k stacks neither.

I wonder if we should generally recommend for drivers of "bus" devices
not to register sub devices from their own probe() routine, but defer
that to a kernel thread... Because the stacking can be pretty bad, I
mean, nobody's done SATA over USB yet but heh :-)

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/