Re: divide error: bdi_dirty_limit+0x5a/0x9e

From: Borislav Petkov
Date: Mon Sep 24 2012 - 08:56:31 EST


On Mon, Sep 24, 2012 at 08:29:00PM +0800, Fengguang Wu wrote:
> On Mon, Sep 24, 2012 at 02:20:53PM +0200, Borislav Petkov wrote:
> > On Mon, Sep 24, 2012 at 07:34:47PM +0800, Fengguang Wu wrote:
> > > Will you test such a line? At least the generic do_div() only uses the
> > > lower 32bits for division.
> > >
> > > WARN_ON(!(den & 0xffffffff));
> >
> > But, but, the asm output says:
> >
> > 28: 48 89 c8 mov %rcx,%rax
> > 2b:* 48 f7 f7 div %rdi <-- trapping instruction
> > 2e: 31 d2 xor %edx,%edx
> >
> > and this version of DIV does an unsigned division of RDX:RAX by the
> > contents of a *64-bit register* ... in our case %rdi.
> >
> > Srivatsa's oops shows the same:
> >
> > 28: 48 89 f0 mov %rsi,%rax
> > 2b:* 48 f7 f7 div %rdi <-- trapping instruction
> > 2e: 41 8b 94 24 74 02 00 mov 0x274(%r12),%edx
> >
> > Right?
>
> Right, that's why I said "at least". As for x86, I'm as clueless as you..

Right, both oopses are on x86 so I don't think it is the bitness of the
division.

Another thing those two have in common is that both happen when a CPU
comes online. Srivatsa's is when CPU9 comes online (oops is detected on
CPU9) and in our case CPU4 comes online but the oops says CPU0.

So it has to be hotplug-related.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/