Re: Oops in current tree in i2c

From: Hans de Goede
Date: Sun Oct 28 2018 - 06:48:01 EST


Hi Linus,

On 27-10-18 18:08, Linus Torvalds wrote:
Julian, Jiri,
On my laptop I'm getting a kernel page fault with the current git
tree, and I'm tentatively blaming commit

9ee3e06610fd ("HID: i2c-hid: override HID descriptors for certain devices")

but that's simply because it's the only thing that seems to touch this
particular area in this merge window.

The oops looks like this:

BUG: unable to handle kernel paging request at 000000007a25d598
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 888 Comm: systemd-udevd Not tainted 4.19.0-07715-g345671ea0f92 #4
Hardware name: Dell Inc. XPS 13 9350/09JHRY, BIOS 1.7.0 01/16/2018
RIP: 0010:strstr+0x19/0x70

where the code disassembly (and the register contents) shows that the
wild pointer is the first argument to "strstr()", which just has a
bogus value that is not a valid kernel pointer (RDI: 000000007a25d598
- which is obviously also the address of the page fault)

The call trace is:

dmi_matches+0x55/0xc0
dmi_first_match+0x26/0x40
i2c_hid_get_dmi_i2c_hid_desc_override+0x16/0x40 [i2c_hid]
i2c_hid_probe+0x28c/0x760 [i2c_hid]
i2c_device_probe+0x1e7/0x260
really_probe+0xf8/0x3e0
driver_probe_device+0x10f/0x120
bus_for_each_drv+0x66/0xb0
__device_attach+0xd9/0x150
bus_probe_device+0x8a/0xa0
device_add+0x48e/0x660
i2c_new_device+0x162/0x350

which is why I suspect that new i2c_hid_get_dmi_hid_report_desc_override() code.

I *think* the problem is that the i2c_hid_dmi_desc_override_table[]
isn't terminated by a NULL entry, and I will test that next.

Yes that likely is the problem. I already had a bug report from one of the
Manjaro maintainers who was cherry picking this into the Manjaro kernel.

So I ran some tests on a laptop of mine which does use i2c-hid but I
failed to reproduce the issue, so we both (me and the Manjaro maintainer)
both assumed something went wrong with the backport.

Both of us seem to have overlooked the missing terminating entry,
as well as other people involved in the patch.

What makes me *very* unhappy about this is that if I'm right, I think
it means that code was literally not tested at all by anybody who
didn't have one of the entries in that list.

That is not true, I've hit one of these unterminated dmi lists
issues before and it depends on what get put in mem directly
after the list by the linker, bugs caused by this do not always
reproduce unfortunately.

And as mentioned I have tested the patch on a machine with an i2c-hid
touchpad, which is not on the list and I did not hit this problem.

Anyways this is fixed now, thank you for catching this.

Regards,

Hans