Re: [PATCH] drm: fix pci_dev root device is NULL without check.
From: Alex Deucher
Date: Wed Jun 20 2018 - 10:39:48 EST
On Wed, Jun 20, 2018 at 4:39 AM, Maarten Lankhorst
<maarten.lankhorst@xxxxxxxxxxxxxxx> wrote:
> Op 19-06-18 om 09:47 schreef Caicai:
>> on my laptop with ATI Radeon R7 350 graphics card,I found root was NULL,we should check
>> the device before get/set pcie speed cap mask. Here is my messages.
>> [ 8.996093] [drm] radeon kernel modesetting enabled.
>> [ 8.996093] [drm] initializing kernel modesetting (OLAND 0x1002:0x6610 0x174B:0xA00B).
>> [ 8.996093] [drm] register mmio base: 0x10000000
>> [ 8.996093] [drm] register mmio size: 262144
>> [ 9.121093] ATOM BIOS: C55001
>> [ 9.121093] [drm] GPU not posted. posting now...
>> [ 9.125000] radeon 0001:20:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
>> [ 9.125000] radeon 0001:20:00.0: GTT: 2048M 0x0000000080000000 - 0x00000000FFFFFFFF
>> [ 9.125000] [drm] Detected VRAM RAM=2048M, BAR=256M
>> [ 9.125000] [drm] RAM width 128bits DDR
>> [ 9.125000] [drm] radeon: 2048M of VRAM memory ready
>> [ 9.125000] [drm] radeon: 2048M of GTT memory ready.
>> [ 9.125000] [drm] Loading oland Microcode
>> [ 9.128906] [drm] Internal thermal controller with fan control
>> [ 9.128906] Unable to handle kernel paging request at virtual address 000000000000003c
>> [ 9.128906] CPU 3 systemd-udevd(148): Oops 0
>> [ 9.128906] pc = [<ffffffff80e63824>] ra = [<fffffffc03faf914>] ps = 0000 Not tainted
>> [ 9.128906] pc is at drm_pcie_get_speed_cap_mask+0x3c/0x12c
>> [ 9.128906] ra is at si_dpm_init+0x64/0x1398 [radeon]
>> [ 9.128906] v0 = ffffffffffffffea t0 = fffffc07fcc3a400 t1 = 0000000000001106
>> [ 9.128906] t2 = 0000000000001166 t3 = 0000000000000000 t4 = fffffc018eafc000
>> [ 9.128906] t5 = ffffffffffffff80 t6 = 000000000000003f t7 = fffffc07f6a90000
>> [ 9.128906] s0 = fffffc07f6a9390c s1 = 0000000000000000 s2 = fffffc07f59119b0
>> [ 9.128906] s3 = 0000000000000001 s4 = fffffffffffffff4 s5 = fffffc07f5910000
>> [ 9.128906] s6 = 0000000000000000
>> [ 9.128906] a0 = fffffc07f706c800 a1 = fffffc07f6a9390c a2 = fffffffffffffffc
>> [ 9.128906] a3 = ffffffff815fb510 a4 = ffffffff815fb4c8 a5 = 0000000000000000
>> [ 9.128906] t8 = 000000000000007f t9 = ffffffff80d86c14 t10= 0000000000000001
>> [ 9.128906] t11= 00000000000003c0 pv = ffffffff80e637e8 at = 0000000000000000
>> [ 9.128906] gp = ffffffff815e9230 sp = fffffc07f6a93868
>> [ 9.128906] Disabling lock debugging due to kernel taint
>> [ 9.128906] Trace:
>> [ 9.128906] [<ffffffff80e61260>] drm_dev_register+0xb0/0x138
>> [ 9.128906] [<ffffffff80e64368>] drm_get_pci_dev+0x120/0x208
>> [ 9.128906] [<ffffffff80dcced4>] local_pci_probe+0x38/0x8c
>> [ 9.128906] [<ffffffff80dcdac0>] pci_device_probe+0x170/0x1d0
>> [ 9.128906] [<ffffffff80e9d654>] driver_probe_device+0x168/0x2fc
>> [ 9.128906] [<ffffffff80e9d87c>] __driver_attach+0x94/0xe8
>> [ 9.128906] [<ffffffff80e9acf4>] bus_for_each_dev+0x94/0xd4
>> [ 9.128906] [<ffffffff80e9d7e8>] __driver_attach+0x0/0xe8
>> [ 9.128906] [<ffffffff80e9ce98>] driver_attach+0x2c/0x40
>> [ 9.128906] [<ffffffff80e9c870>] bus_add_driver+0x140/0x2d4
>> [ 9.128906] [<ffffffff80e9e28c>] driver_register+0x100/0x180
>> [ 9.128906] [<ffffffff80dccda0>] __pci_register_driver+0x48/0x5c
>> [ 9.128906] [<ffffffff80e644cc>] drm_pci_init+0x7c/0x168
>> [ 9.128906] [<ffffffff809102fc>] do_one_initcall+0x188/0x25c
>> [ 9.128906] [<ffffffff809f5f00>] do_init_module+0x8c/0x2c8
>> [ 9.128906] [<ffffffff80a5b698>] kmem_cache_alloc+0x1f8/0x22c
>> [ 9.128906] [<ffffffff809f5eb4>] do_init_module+0x40/0x2c8
>> [ 9.128906] [<ffffffff809b36c8>] load_module+0x1ea8/0x263c
>> [ 9.128906] [<ffffffff809af9fc>] unknown_module_param_cb+0x0/0xc8
>> [ 9.128906] [<ffffffff809b40a4>] SYSC_finit_module+0x94/0xb4
>> [ 9.128906] [<ffffffff809aeb68>] module_notes_read+0x0/0x4c
>> [ 9.128906] [<ffffffff80911040>] entSys+0xa0/0xc0
> Grepping on entSys the only mention is on alpha.
>
> Is dev->pdev->bus->parent NULL by any chance?
>> [ 9.128906] Code: 8c300188 c020003b 8c210010 f85f1106 f87f1166 8d410038 <842a003c> 40220502
>> [ 9.128906] systemd-udevd[136]: worker [148] terminated by signal 11 (Segmentation fault)
>>
>> Signed-off-by: Caicai <caizhaopeng@xxxxxxxxxx>
>> ---
>> drivers/gpu/drm/drm_pci.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_pci.c b/drivers/gpu/drm/drm_pci.c
>> index 4db9c515b74f..3d1cd760b79c 100644
>> --- a/drivers/gpu/drm/drm_pci.c
>> +++ b/drivers/gpu/drm/drm_pci.c
>> @@ -332,10 +332,12 @@ int drm_pcie_get_speed_cap_mask(struct drm_device *dev, u32 *mask)
>> u32 lnkcap, lnkcap2;
>>
>> *mask = 0;
>> - if (!dev->pdev)
>> + if (!dev->pdev || !dev->pdev->bus)
>> return -EINVAL;
> I think we can assume our device is on a bus.
>> root = dev->pdev->bus->self;
>> + if (!root)
>> + return -EINVAL;
> I'm not a PCI expert, but seems like a bad design to say none of the speeds are supported just because we lack a bridge.
>
> btw, drm_pcie_get_max_link_width is similarly affected.
Is the VM passthrough by any chance? These functions need access to
the bridge and pci config space to determine what speeds the link
supports. Those are not always available in VMs.
Alex
>
> ~Maarten
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/dri-devel