Re: [PANIC, hyperv] BUG: unable to handle kernel paging request at ffff880077800004 (hv_ringbuffer_write)

From: Sitsofe Wheeler
Date: Mon Aug 25 2014 - 13:41:45 EST


Hi Dexuan,

On Mon, Aug 25, 2014 at 02:02:21PM +0000, Dexuan Cui wrote:
> > -----Original Message-----
> > From: Sitsofe Wheeler
> > Sent: Wednesday, August 20, 2014 17:27 PM
> >
> > While booting a Hyper-V 3.17.0-rc1 guest on a 2012 R2 host a BUG was
> > triggered while registering hyperv_fb which in turn caused a panic.
> > Various kernel debugging options (CONFIG_DEBUG_PAGEALLOC,
> > CONFIG_SLUB_DEBUG=y...) were on at the time. This only seems to happen
> > if the guest is being booted with only one CPU allocated to it.
>
> I can reproduce the exact issue with the same commit + your kconfig + UP
> guest (SMP guest seems ok.)

Thanks for getting back - I was wondering if my mails had dropped into a
black hole as I haven't heard anything on any of them for a few days
(and no one had mentioned they had been able to reproduce the issues
reported).

> > [ 7.645526] hv_vmbus: registering driver hyperv_fb
> > [ 7.657553] BUG: unable to handle kernel paging request at
> > ffff880077800004
> > [ 7.658224] IP: [<ffffffff8159a7ac>] hv_ringbuffer_write+0x7c/0x150
> > [ 7.658224] PGD 2da9067 PUD 2dac067 PMD 7fa27067 PTE
> > 8000000077800060
> > [ 7.658224] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> It seems
> hv_ringbuffer_write() ->
> hv_get_ringbuffer_availbytes():
> reading rbi->ring_buffer->read_index causes a page fault.
>
> It looks rbi->ring_buffer was unmapped somehow according to the
> semantics of CONFIG_DEBUG_PAGEALLOC??? Or, was there a memory
> corruption somewhere?
>
> It looks the panic will disappear if the guest isn't configured with a
> "Network Adapter ".

This sounds very fishy as if network setup has left things in a bad
state. What is baffles me is the whole UP vs SMP thing - why would UP
make this show up consistently? Perhaps some assertions could be added
to check that rbi->ring_buffer still has sane values in it after
operations on it are finished?

I guess you could try switching things around and using
kmemcheck (https://www.kernel.org/doc/Documentation/kmemcheck.txt ). If
the whole area close to rbi->ring_buffer->read_index is being stomped on
it should show up. If it's just being set to a duff value or freed that
going to be harder to track down although poisoning before freeing
should allow us to distinguish that case...

>From your analysis this doesn't sound framebuffer related - perhaps we
could drop the linuxfb CC's on these mails going forward?

--
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/