Re: [lkp-robot] [platform/x86] b925ff7dcd: BUG:unable_to_handle_kernel

From: MichaÅ KÄpieÅ
Date: Mon Feb 13 2017 - 03:15:02 EST

> Michael
> On Mon, Feb 13, 2017 at 10:40:15AM +0800, kernel test robot wrote:
> > FYI, we noticed the following commit:
> >
> > commit: b925ff7dcd1fc45b86baaebd3442f8b484123716 ("platform/x86: fujitsu-laptop: only register backlight device if FUJ02B1 is present")
> > url:
> > base: git:// for-next
> >
> > in testcase: boot
> >
> > on test machine: qemu-system-i386 -enable-kvm -cpu Haswell,+smep,+smap -m 360M
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > :
> > [ 4.656202] fujitsu_laptop: call_fext_func: FUNC interface is not present
> > [ 4.657478] BUG: unable to handle kernel NULL pointer dereference at 00000008
> > [ 4.658433] IP: fujitsu_init+0x137/0x1b7
> > [ 4.659208] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
> > :
> I'm away from my Fujitsu hardware right now, and in any case this does not
> get triggered on it.
> I note that prior to the bug we get "FUNC interface is not present".
> Therefore something called call_fext_func() some time before the NULL
> pointer dereference. As far as I can tell the only such call which could be
> made prior to or within fujitsu_init() is from fujitsu_init(), but this is
> conditional on acpi_video_get_backlight_type() returning
> acpi_backlight_vendor (line 1269). Obviously if this conditional were
> passed but fujitsu_bl->bl_device were NULL then we would get the NULL
> dereference.

Yes, this is exactly what is happening.


I think you meant ACPI_FUJITSU_BL_HID.

> is not present then presumedly the
> acpi_bus_register_driver(&acpi_fujitsu_bl_driver)
> call in fujitsu_init() will fail and nothing further would happen.
> Therefore this HID must be in the system.

Not really. acpi_bus_register_driver() is just a wrapper around
driver_register(). In other words, whether or not a given HID is
present in the firmware does not have any influence on the return code
of that function.

In fact, the bug only happens when ACPI_FUJITSU_BL_HID is _not_ present.
Or, putting it differently, on all Skylake machines (when
acpi_backlight=vendor). Once again I am really glad the kernel test
robot is there to counter my carelessness...

> However, the acpi_fujitsu_bl_add() callback wouldn't necessarily get run by
> acpi_bus_register_driver(), would it? I'm not too familiar with the lower
> level ACPI functions but a quick trip through the source suggested that the
> add callback isn't called via acpi_bus_register_driver(). This would mean
> that that fujitsu_bl->bl_device would not yet be initialised when referenced
> within fujitsu_init() at line 1271 or 1273. If this were the case then I
> see two options:
> 1) Don't move the backlight registration out of fujitsu_init().
> 2) Move the remaining backlight code (lines 1268-1274) into
> acpi_fujitsu_bl_add().
> Item 1 effectively amounts to dropping this commit. I'm not sure option 2
> is workable because of the code's reliance on FUJ02E3; is that guaranteed to
> be useable by the time acpi_fujitsu_bl_add() is called?

To keep things simple, I think we should drop this particular patch for
now. Darren, Andy, could you skip it when applying this series?
Patches 9 and 10 do not rely on this one being applied. Thanks and
sorry for the trouble. v2 of my fujitsu_init() cleanup series will fix
this properly.

> The only problem with the above theory is that it doesn't explain why the
> NULL pointer dereference wasn't triggered on my Fujitsu hardware when this
> code was tested on it.

Because the bug is not triggered as long as FUJ02B1 is present.

> If the ACPI bus probed/added asynchronously I guess
> there could be a race whereby acpi_fujitsu_bl_add() may or may not have
> completed by the time fujitsu_init() referenced fujitsu_bl->bl_device. That
> doesn't seem right to me though.

When acpi_bus_register_driver() is called, the .add callback is
"synchronously" called for all ACPI devices handled by the registered
driver that are yet unbound to any driver. So if FUJ02B1 is present,
acpi_fujitsu_bl_add() is called and bl_device is allocated. However, if
that ACPI device is not present (like on Skylakes) and
acpi_backlight=vendor, we get a NULL dereference.

Best regards,
MichaÅ KÄpieÅ