Re: camss NULL-deref on power on with 6.12-rc2

From: Johan Hovold
Date: Fri Oct 11 2024 - 05:56:56 EST


On Fri, Oct 11, 2024 at 10:41:30AM +0100, Bryan O'Donoghue wrote:
> On 11/10/2024 10:33, Johan Hovold wrote:

> > This morning I hit the below NULL-deref in camss when booting a 6.12-rc2
> > kernel on the Lenovo ThinkPad X13s.
> >
> > I booted the same kernel another 50 times without hitting it again it so
> > it may not be a regression, but simply an older, hard to hit bug.
> >
> > Hopefully you can figure out what went wrong from just staring at the
> > oops and code.

> > [ 5.657860] ov5675 24-0010: failed to get HW configuration: -517
>
> So this caused it, I guess the sensor failed to power up.

The probe deferral may be involved, but we see this deferral all the
time without things blowing up (and the driver should be able to handle
that).

> You've booted 50 times in a row and hit a corner case where the sensor
> didn't power up leading to a NULL deference.
>
> So, two bugs I'd say.
>
> - What is the cirumcstance where the sensor doesn't power up

Not sure what is causing it, but I have seen boots where this message
shows up 5-6 times, which may indeed indicate that something is off. If
this was just a provider not having probed yet, driver core should
generally prevent the sensor from from probing until the resources (e.g.
clocks) are available.

> - What's the NULL either entity * or entity->pad I'd say.
>
> <snip>
> > [ 6.594915] Call trace:
> > [ 6.594915] camss_find_sensor+0x20/0x74 [qcom_camss]
> Hmm, not sure looking at what we have.
>
> pad = &entity->pads[0];
> if (!(pad->flags & MEDIA_PAD_FL_SINK))
> return NULL;
>
> Is pad guaranteed after entity->pads[0] ?
> We dereference it like its guaranteed.
>
> Anyway thanks for the report, should be enough start digging.

Thanks.

Johan