Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

From: Lyude Paul
Date: Wed Feb 14 2018 - 14:11:13 EST


Actually this was brought up to me already, there's a fix on the mailing list
for this I reviewed a little while ago from nvidia that we should pull in:

https://patchwork.freedesktop.org/patch/203205/

Would you guys mind confirming that this patch fixes your issues?

On Wed, 2018-02-14 at 18:41 +0100, Pierre Moreau wrote:
> On 2018-02-14 â 09:36, Ilia Mirkin wrote:
> > On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin <imirkin@xxxxxxxxxxxx> wrote:
> > > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos <mroos@xxxxxxxx> wrote:
> > > > > This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in
> > > > > 4.15:
> > > >
> > > > NV5 in another PC (secondary card in x86-64) made the systrem crash on
> > > > boot, in nvkm_therm_clkgate_fini.
> > >
> > > Mind booting with nouveau.debug=trace? That should hopefully tell us
> > > more exactly which thing is dying. If you have a cross-compile/distcc
> > > setup handy, a bisect may be even more useful.
> >
> > Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
> > somehow mis-hooked up for NV5 now. A bisect result would still make
> > the culprit a lot more obvious.
>
> CCâing Lyude Paul as she hooked up the clockgating support.
>
> Looking at the code, only NV40+ do have a therm engine. Therefore, shouldnât
> nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and
> nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of
> their check for the clkgate_* hooks being there? Or instead, maybe have the
> check in nvkm_device_init() nvkm_device_init()?
>
> Pierre
--
Cheers,
Lyude Paul