Re: Serial related oops

From: Frederik Deweerdt
Date: Mon Feb 19 2007 - 09:49:49 EST


(trimmed tie-fei.zang from the CC, added by mistake)
On Mon, Feb 19, 2007 at 02:35:20PM +0000, Russell King wrote:
> > Neither did I, but introducing printk's through the function, we narrowed
> > the problem to this part of the code. And removing it makes the problem
> > go away. We inserted 37 printk's in the function body, and Jose bisected
> > those until the problem went away.
>
> Well, there's still little clue about why this is causing a NULL pointer
> dereference. The only thing I can think is that somehow performing
> this test is causing a power glitch to your CPU, causing its registers
> to get corrupted, and which results in it doing a NULL pointer deref.
That may be the case, indeed.
>
> Are you saying that the NULL pointer occurred while executing this code?
> If not, where does the NULL pointer occur?
The thing is, the NULL pointer deref dissapeared as soon as we
instrumented (printk'ed) the code. So it's seems to be triggered by
check+timing+hardware.
>
> > > No, it's only runtime because you can't tell which ports might be
> > > affected, and you might have a mixture of ports which are affected
> > > and those which aren't.
> > Hmm, ok. And what about a CONFIG_I_KNOW_MY_SERIAL_IS_BROKEN option?
>
> Andrew's said no (in that the thread you refer to) and suggested an
> alternative, I've said no, how many more 'no's do you need to turn
> you away from the wrong approach?
One is usually sufficient once I've understood :). I missed the module
option approach. Is it ok with you? If yes, I'll put up a patch to do
this.
>
> > > > PS: CCing Andrew and Zang Roy-r61911 as they seemed to discuss this in
> > > > http://lkml.org/lkml/2006/6/13/21
> > >
> > > I don't see any reference to this problem there.
> >
> > Sorry, I suck, I got that mixed with that one:
> > http://lkml.org/lkml/2006/12/26/63
> > "probing for UART_BUG_TXEN in 8250 driver leads to weird effects on some
> > ARM boards"
>
> The "weird effects" were never quantified, so that's one of the reasons
> I ignored that report (another being is that I stopped being the serial
> maintainer a while ago, and now serial is maintainerless.)
>
The problem appears to be reproducible on Jose's hardware within 2-3 days.
If you see other tests to be performed...

Regards,
Frederik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/