Re: next/master boot bisection: Oops in nouveau driver on jetson-tk1

From: Guillaume Tucker
Date: Mon Dec 10 2018 - 05:00:16 EST


On 08/12/2018 00:08, Lyude Paul wrote:
> uhhhhhhhhhhhhh
> didn't we fix this weeks ago? with "drm/nouveau: tegra: Call
> nouveau_drm_device_init()"

Yes here's the fix from Thierry:

https://patchwork.freedesktop.org/patch/263587/


and I can confirm that it does fix the Oops when applied on top
of next-20181206 (what I used for the bisection last week):

http://lava.baylibre.com:10080/scheduler/job/71109


However the fix doesn't appear to have been applied in any
upstream tree yet.

Guillaume

> On Fri, 2018-12-07 at 23:31 +0000, Guillaume Tucker wrote:
>> Please find below an automated bisection report for a kernel Oops
>> seen during the initialisation of the nouveau GPU driver on
>> jetson-tk1.
>>
>>
>> All the LAVA test jobs for this bisection can be found here:
>>
>>
>> http://lava.baylibre.com:10080/scheduler/alljobs?length=25&search=lava-bisect-staging-7366#table
>>
>>
>> Here's the beginning of the Oops stack trace:
>>
>> [ 7.485361] [00000064] *pgd=f9e7b835
>> [ 7.485372] Internal error: Oops: 17 [#1] SMP ARM
>> [ 7.485376] Modules linked in: snd_soc_tegra_rt5640(+)
>> snd_soc_tegra_utils snd_soc_rt5640(+) snd_soc_rl6231 snd_soc_tegra30_ahub
>> snd_hda_tegra snd_soc_core snd_hda_codec snd_hda_core ac97_bus
>> snd_pcm_dmaengine snd_pcm xhci_tegra(+) snd_timer snd soundcore nouveau(+)
>> ttm tegra_devfreq tegra_wdt
>> [ 7.542227] CPU: 1 PID: 128 Comm: udevd Not tainted 4.20.0-rc5-next-
>> 20181206 #44
>> [ 7.549603] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
>> [ 7.555859] PC is at drm_plane_register_all+0x18/0x50
>> [ 7.560899] LR is at drm_modeset_register_all+0xc/0x6c
>>
>>
>> Full log:
>>
>> http://lava.baylibre.com:10080/scheduler/job/68628#L816
>>
>>
>> The bisection was run from next-20181206 as this is where the
>> issue was discovered on kernelci.org but the patch it found has
>> already been merged in mainline.
>>
>> Hope this helps!
>>
>> Guillaume
>>
>>
>> -----------8<------------------------8<-----------
>>
>>
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>> * This automated bisection report was sent to you on the basis *
>> * that you may be involved with the breaking commit it has *
>> * found. No manual investigation has been done to verify it, *
>> * and the root cause of the problem may be somewhere else. *
>> * Hope this helps! *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> Bisection result for next/master (next-20181206) on jetson-tk1
>>
>> Good: 84df9525b0c2 Linux 4.19
>> Bad: 4c92b7b3080d Add linux-next specific files for 20181206
>> Found: cfea88a4d866 drm/nouveau: Start using new drm_dev
>> initialization helpers
>>
>> Checks:
>> revert: PASS
>> verify: PASS
>>
>> Parameters:
>> Tree: next
>> URL: None
>> Branch: master
>> Target: jetson-tk1
>> Lab: lab-baylibre
>> Config: multi_v7_defconfig
>> Plan: dmesg-nouveau
>>
>> Breaking commit found:
>>
>> ----------------------------------------------------------------------------
>> ---
>> commit cfea88a4d86632f28cf80be97079f131645b7869
>> Author: Lyude Paul <lyude@xxxxxxxxxx>
>> Date: Wed Aug 22 21:40:07 2018 -0400
>>
>> drm/nouveau: Start using new drm_dev initialization helpers
>>
>> Per the documentation in drm_get_pci_dev(), this function is deprecated
>> and shouldn't be used anymore. As it turns out, we're going to need to
>> stop using drm_get_pci_dev() anyway in order to allow us to turn off the
>> card before full system shutdowns, otherwise we'll hit race conditions
>> with userspace while trying to tear down the card on shutdown.
>>
>> So, start using drm_dev_get() and drm_dev_put(), and just turn our
>> load/unload callbacks into open coded init/fini() functions.
>>
>> Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx>
>> Cc: Karol Herbst <kherbst@xxxxxxxxxx>
>> Signed-off-by: Ben Skeggs <bskeggs@xxxxxxxxxx>
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
>> b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> index 905956809d21..2b2baf6e0e0d 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> @@ -458,75 +458,8 @@ nouveau_accel_init(struct nouveau_drm *drm)
>> nouveau_bo_move_init(drm);
>> }
>>
>> -static int nouveau_drm_probe(struct pci_dev *pdev,
>> - const struct pci_device_id *pent)
>> -{
>> - struct nvkm_device *device;
>> - struct apertures_struct *aper;
>> - bool boot = false;
>> - int ret;
>> -
>> - if (vga_switcheroo_client_probe_defer(pdev))
>> - return -EPROBE_DEFER;
>> -
>> - /* We need to check that the chipset is supported before booting
>> - * fbdev off the hardware, as there's no way to put it back.
>> - */
>> - ret = nvkm_device_pci_new(pdev, NULL, "error", true, false, 0,
>> &device);
>> - if (ret)
>> - return ret;
>> -
>> - nvkm_device_del(&device);
>> -
>> - /* Remove conflicting drivers (vesafb, efifb etc). */
>> - aper = alloc_apertures(3);
>> - if (!aper)
>> - return -ENOMEM;
>> -
>> - aper->ranges[0].base = pci_resource_start(pdev, 1);
>> - aper->ranges[0].size = pci_resource_len(pdev, 1);
>> - aper->count = 1;
>> -
>> - if (pci_resource_len(pdev, 2)) {
>> - aper->ranges[aper->count].base = pci_resource_start(pdev, 2);
>> - aper->ranges[aper->count].size = pci_resource_len(pdev, 2);
>> - aper->count++;
>> - }
>> -
>> - if (pci_resource_len(pdev, 3)) {
>> - aper->ranges[aper->count].base = pci_resource_start(pdev, 3);
>> - aper->ranges[aper->count].size = pci_resource_len(pdev, 3);
>> - aper->count++;
>> - }
>> -
>> -#ifdef CONFIG_X86
>> - boot = pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW;
>> -#endif
>> - if (nouveau_modeset != 2)
>> - drm_fb_helper_remove_conflicting_framebuffers(aper,
>> "nouveaufb", boot);
>> - kfree(aper);
>> -
>> - ret = nvkm_device_pci_new(pdev, nouveau_config, nouveau_debug,
>> - true, true, ~0ULL, &device);
>> - if (ret)
>> - return ret;
>> -
>> - pci_set_master(pdev);
>> -
>> - if (nouveau_atomic)
>> - driver_pci.driver_features |= DRIVER_ATOMIC;
>> -
>> - ret = drm_get_pci_dev(pdev, pent, &driver_pci);
>> - if (ret) {
>> - nvkm_device_del(&device);
>> - return ret;
>> - }
>> -
>> - return 0;
>> -}
>> -
>> static int
>> -nouveau_drm_load(struct drm_device *dev, unsigned long flags)
>> +nouveau_drm_device_init(struct drm_device *dev)
>> {
>> struct nouveau_drm *drm;
>> int ret;
>> @@ -613,7 +546,7 @@ nouveau_drm_load(struct drm_device *dev, unsigned long
>> flags)
>> }
>>
>> static void
>> -nouveau_drm_unload(struct drm_device *dev)
>> +nouveau_drm_device_fini(struct drm_device *dev)
>> {
>> struct nouveau_drm *drm = nouveau_drm(dev);
>>
>> @@ -642,18 +575,116 @@ nouveau_drm_unload(struct drm_device *dev)
>> kfree(drm);
>> }
>>
>> +static int nouveau_drm_probe(struct pci_dev *pdev,
>> + const struct pci_device_id *pent)
>> +{
>> + struct nvkm_device *device;
>> + struct drm_device *drm_dev;
>> + struct apertures_struct *aper;
>> + bool boot = false;
>> + int ret;
>> +
>> + if (vga_switcheroo_client_probe_defer(pdev))
>> + return -EPROBE_DEFER;
>> +
>> + /* We need to check that the chipset is supported before booting
>> + * fbdev off the hardware, as there's no way to put it back.
>> + */
>> + ret = nvkm_device_pci_new(pdev, NULL, "error", true, false, 0,
>> &device);
>> + if (ret)
>> + return ret;
>> +
>> + nvkm_device_del(&device);
>> +
>> + /* Remove conflicting drivers (vesafb, efifb etc). */
>> + aper = alloc_apertures(3);
>> + if (!aper)
>> + return -ENOMEM;
>> +
>> + aper->ranges[0].base = pci_resource_start(pdev, 1);
>> + aper->ranges[0].size = pci_resource_len(pdev, 1);
>> + aper->count = 1;
>> +
>> + if (pci_resource_len(pdev, 2)) {
>> + aper->ranges[aper->count].base = pci_resource_start(pdev, 2);
>> + aper->ranges[aper->count].size = pci_resource_len(pdev, 2);
>> + aper->count++;
>> + }
>> +
>> + if (pci_resource_len(pdev, 3)) {
>> + aper->ranges[aper->count].base = pci_resource_start(pdev, 3);
>> + aper->ranges[aper->count].size = pci_resource_len(pdev, 3);
>> + aper->count++;
>> + }
>> +
>> +#ifdef CONFIG_X86
>> + boot = pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW;
>> +#endif
>> + if (nouveau_modeset != 2)
>> + drm_fb_helper_remove_conflicting_framebuffers(aper,
>> "nouveaufb", boot);
>> + kfree(aper);
>> +
>> + ret = nvkm_device_pci_new(pdev, nouveau_config, nouveau_debug,
>> + true, true, ~0ULL, &device);
>> + if (ret)
>> + return ret;
>> +
>> + pci_set_master(pdev);
>> +
>> + if (nouveau_atomic)
>> + driver_pci.driver_features |= DRIVER_ATOMIC;
>> +
>> + drm_dev = drm_dev_alloc(&driver_pci, &pdev->dev);
>> + if (IS_ERR(drm_dev)) {
>> + ret = PTR_ERR(drm_dev);
>> + goto fail_nvkm;
>> + }
>> +
>> + ret = pci_enable_device(pdev);
>> + if (ret)
>> + goto fail_drm;
>> +
>> + drm_dev->pdev = pdev;
>> + pci_set_drvdata(pdev, drm_dev);
>> +
>> + ret = nouveau_drm_device_init(drm_dev);
>> + if (ret)
>> + goto fail_pci;
>> +
>> + ret = drm_dev_register(drm_dev, pent->driver_data);
>> + if (ret)
>> + goto fail_drm_dev_init;
>> +
>> + return 0;
>> +
>> +fail_drm_dev_init:
>> + nouveau_drm_device_fini(drm_dev);
>> +fail_pci:
>> + pci_disable_device(pdev);
>> +fail_drm:
>> + drm_dev_put(drm_dev);
>> +fail_nvkm:
>> + nvkm_device_del(&device);
>> + return ret;
>> +}
>> +
>> void
>> nouveau_drm_device_remove(struct drm_device *dev)
>> {
>> + struct pci_dev *pdev = dev->pdev;
>> struct nouveau_drm *drm = nouveau_drm(dev);
>> struct nvkm_client *client;
>> struct nvkm_device *device;
>>
>> + drm_dev_unregister(dev);
>> +
>> dev->irq_enabled = false;
>> client = nvxx_client(&drm->client.base);
>> device = nvkm_device_find(client->device);
>> - drm_put_dev(dev);
>>
>> + nouveau_drm_device_fini(dev);
>> + pci_disable_device(pdev);
>> + drm_dev_put(dev);
>> nvkm_device_del(&device);
>> }
>>
>> @@ -1020,8 +1051,6 @@ driver_stub = {
>> DRIVER_GEM | DRIVER_MODESET | DRIVER_PRIME | DRIVER_RENDER |
>> DRIVER_KMS_LEGACY_CONTEXT,
>>
>> - .load = nouveau_drm_load,
>> - .unload = nouveau_drm_unload,
>> .open = nouveau_drm_open,
>> .postclose = nouveau_drm_postclose,
>> .lastclose = nouveau_vga_lastclose,
>> ----------------------------------------------------------------------------
>> ---
>>
>>
>> Git bisection log:
>>
>> ----------------------------------------------------------------------------
>> ---
>> git bisect start
>> # good: [84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d] Linux 4.19
>> git bisect good 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d
>> # bad: [4c92b7b3080d8281941ae81c51cd62bb49bdc3d4] Add linux-next specific
>> files for 20181206
>> git bisect bad 4c92b7b3080d8281941ae81c51cd62bb49bdc3d4
>> # bad: [c38239b4be1ac7e4bcf5bbd971353bae51525b8f] Merge branch 'parisc-4.20-
>> 2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
>> git bisect bad c38239b4be1ac7e4bcf5bbd971353bae51525b8f
>> # good: [d49f8a52b15bf35db778035340d8a673149f9f93] Merge tag 'scsi-misc' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>> git bisect good d49f8a52b15bf35db778035340d8a673149f9f93
>> # good: [ac747c0715f29c2be3848b719a1b7e65b07f7b21] Merge tag 'kbuild-v4.20'
>> of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
>> git bisect good ac747c0715f29c2be3848b719a1b7e65b07f7b21
>> # bad: [46972c03ab667dc298cad0c9db517fb9b1521b5f] Merge tag 'drm-misc-next-
>> fixes-2018-10-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-
>> next
>> git bisect bad 46972c03ab667dc298cad0c9db517fb9b1521b5f
>> # good: [6ac99a328ee16d3f8cc253f1df62623cee3e9ea5] drm/exynos: mixer: Make
>> plane alpha configurable
>> git bisect good 6ac99a328ee16d3f8cc253f1df62623cee3e9ea5
>> # good: [0957dc7097a3f462f6cedb45cf9b9785cc29e5bb] drm/amdgpu: revert "stop
>> using gart_start as offset for the GTT domain"
>> git bisect good 0957dc7097a3f462f6cedb45cf9b9785cc29e5bb
>> # good: [2de0b0a158bf423208c3898522c8fa1c1078df48] Merge tag 'drm/tegra/for-
>> 4.20-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
>> git bisect good 2de0b0a158bf423208c3898522c8fa1c1078df48
>> # good: [04b96b63c5640a305e30611def7a9c5fcd7a72cf] drm/msm/dpu: Remove
>> unneeded checks in dpu_crtc.c
>> git bisect good 04b96b63c5640a305e30611def7a9c5fcd7a72cf
>> # good: [6952e3a1dffcb931cf8625aa01642b9afac2af61] Merge branch 'for-
>> upstream/mali-dp' of git://linux-arm.org/linux-ld into drm-next
>> git bisect good 6952e3a1dffcb931cf8625aa01642b9afac2af61
>> # good: [62e681f7dcab746412dce22d4b75b32c5ea38cdb] Merge tag 'drm-msm-fixes-
>> 2018-10-09' of git://people.freedesktop.org/~robclark/linux into drm-next
>> git bisect good 62e681f7dcab746412dce22d4b75b32c5ea38cdb
>> # bad: [7e6191d4360a2df6cf2a2613dcb79680cb943df8] Merge branch 'linux-4.20'
>> of git://github.com/skeggsb/linux into drm-next
>> git bisect bad 7e6191d4360a2df6cf2a2613dcb79680cb943df8
>> # good: [c4cee69a4497d9c6ad8868d63568b30e50cac9e9] drm/nouveau: Fix
>> potential memory leak in nouveau_drm_load()
>> git bisect good c4cee69a4497d9c6ad8868d63568b30e50cac9e9
>> # bad: [a971558c298755d2c07bc5508c65d689471763c8] drm/nouveau/disp: keep
>> track of high-speed state, program into clock
>> git bisect bad a971558c298755d2c07bc5508c65d689471763c8
>> # bad: [4126b99e744b7a29746e201e2be6644d2edf3c56] drm/nouveau/disp: add a
>> way to configure scrambling/tmds for hdmi 2.0
>> git bisect bad 4126b99e744b7a29746e201e2be6644d2edf3c56
>> # bad: [cfea88a4d86632f28cf80be97079f131645b7869] drm/nouveau: Start using
>> new drm_dev initialization helpers
>> git bisect bad cfea88a4d86632f28cf80be97079f131645b7869
>> # first bad commit: [cfea88a4d86632f28cf80be97079f131645b7869] drm/nouveau:
>> Start using new drm_dev initialization helpers
>> ----------------------------------------------------------------------------
>> ---