Re: [Xen-devel] Re: 2.6.39 crashes BUG: unable to handle kernelNULL pointer dereference at 000000000000042 .. cmos_checkintr+0x4d/0x55under Xen as PV guest.

From: John Stultz
Date: Thu Mar 24 2011 - 15:05:51 EST


On Thu, 2011-03-24 at 08:27 -0400, Konrad Rzeszutek Wilk wrote:
> The problem is this:
>
> cmos_do_probe does:
>
> cmos_rtc.dev = dev;
> dev_set_drvdata(dev, &cmos_rtc);
>
> which means that dev->p->private_data contains cmos_rtc. And
> dev->p->private_data->rtc is a NULL pointer. The next function:
>
> cmos_rtc.rtc = rtc_device_register(driver_name, dev,
> &cmos_rtc_ops, THIS_MODULE);
>
> 'rtc_device_register' creates an 'rtc' structure and sets
> its parent to be:
> rtc->dev.parent = dev;

And rtc_device_register() also calls:
INIT_WORK(&rtc->irqwork, rtc_timer_do_work);


> and later on it does:
> if (!err && !rtc_valid_tm(&alrm.time))
> rtc_set_alarmtrtc, &alrm);
>
> rtc_set_alarm calls rtc_timer_enqueue which calls __rtc_set_alarm.
> __rtc_set_alarms calls 'cmos_set_alarm' via:
> err = rtc->ops->set_alarm(rtc->dev.parent, alarm);
>
> which is basically passing in 'dev' to 'cmos_set_alarm', and
> 'cmos_set_alarm' uses the dev to:
> struct cmos_rtc *cmos = dev_get_drvdata(dev);
>
> (so get the from dev->p->private_data the cmos_rtc).
> get the 'cmos' (which is what 'cmos_rtc').


Ok. So this all seems like it is working as it should, although you
stated something to the effect that Xen doesn't support the cmos? And
here it seems like it is being probed and registered without issue.


> Great... except
> then it ends up trying to dereference cmos->rtc.irqwork (via
> cmos_irq_disable(cmos, .. and somehere in its chain calls
> schedule_work(cmos->rtc) whcih ends up blowing up b/c
> cmos_rtc.rtc has not been set yet.
>
> The cmos_rtc.rtc is set when the when 'rtc_device_register'
> finish, which it hadn't yet done.

Ok, so you're saying we've not completed rtc_device_register, when
something calls rtc_update_irq(), which then tries to schedule the work,
which hasn't been initialized.

> git gui blame tells me to look at
> f44f7f96a20af16f6f12e1c995576d6becf5f57b

Hmm. So this seems even more odd at first glance, as we're really just
adding an extra call in rtc_device_register(), which supposedly isn't
being run yet.

But... maybe the extra call in rtc_device_register is actually slowing
it down enough, and if an irq lands in the cmos code before the
device_register code is complete, that might cause trouble.

Or... more likely, the added rtc_set_alarm is enabling interrupts,
allowing an irq to land before the rtc_device_register completes.

Oof.. ok. so that is a little gross.


Let me see if I can't work something out here.

Thanks for the helpful clues here!
-john




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/