Re: [BUG] i2c_nvidia_gpu takes long time and makes system suspend & resume failed with NVIDIA cards

From: Heikki Krogerus
Date: Thu Apr 02 2020 - 06:34:52 EST


Hi,

On Thu, Apr 02, 2020 at 06:22:14PM +0800, Jian-Hong Pan wrote:
> Hi,
>
> We got some machines like Acer desktop equipped with NVIDIA GTX 1660
> card, Acer Predator PH315-52 equipped with NVIDIA GeForce RTX 2060
> Mobile and ASUS UX581LV equipped with NNVIDIA GeForce RTX 2060.
> We found them take long time (more than 50 seconds) to resume after
> suspend. During the resuming time, the screen is blank. And check
> the dmesg, found the error during resume:
>
> [ 28.060831] PM: suspend entry (deep)
> [ 28.144260] Filesystems sync: 0.083 seconds
> [ 28.150219] Freezing user space processes ...
> [ 48.153282] Freezing of tasks failed after 20.003 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [ 48.153447] systemd-udevd D13440 382 330 0x80004124
> [ 48.153457] Call Trace:
> [ 48.153504] ? __schedule+0x272/0x5a0
> [ 48.153558] ? hrtimer_start_range_ns+0x18c/0x2c0
> [ 48.153622] schedule+0x45/0xb0
> [ 48.153668] schedule_hrtimeout_range_clock+0x8f/0x100
> [ 48.153738] ? hrtimer_init_sleeper+0x80/0x80
> [ 48.153798] usleep_range+0x5a/0x80
> [ 48.153850] gpu_i2c_check_status.isra.0+0x3a/0xa0 [i2c_nvidia_gpu]
> [ 48.153933] gpu_i2c_master_xfer+0x155/0x20e [i2c_nvidia_gpu]
> [ 48.154012] __i2c_transfer+0x163/0x4c0
> [ 48.154067] i2c_transfer+0x6e/0xc0
> [ 48.154120] ccg_read+0x11f/0x170 [ucsi_ccg]
> [ 48.154182] get_fw_info+0x17/0x50 [ucsi_ccg]
> [ 48.154242] ucsi_ccg_probe+0xf4/0x200 [ucsi_ccg]
> [ 48.154312] ? ucsi_ccg_init+0xe0/0xe0 [ucsi_ccg]
> [ 48.154377] i2c_device_probe+0x113/0x210
> [ 48.154435] really_probe+0xdf/0x280
> [ 48.154487] driver_probe_device+0x4b/0xc0
> [ 48.154545] device_driver_attach+0x4e/0x60
> [ 48.154604] __driver_attach+0x44/0xb0
> [ 48.154657] ? device_driver_attach+0x60/0x60
> [ 48.154717] bus_for_each_dev+0x6c/0xb0
> [ 48.154772] bus_add_driver+0x172/0x1c0
> [ 48.154824] driver_register+0x67/0xb0
> [ 48.154877] i2c_register_driver+0x39/0x70
> [ 48.154932] ? 0xffffffffc00ac000
> [ 48.154978] do_one_initcall+0x3e/0x1d0
> [ 48.155032] ? free_vmap_area_noflush+0x8d/0xe0
> [ 48.155093] ? _cond_resched+0x10/0x20
> [ 48.155145] ? kmem_cache_alloc_trace+0x3a/0x1b0
> [ 48.155208] do_init_module+0x56/0x200
> [ 48.155260] load_module+0x21fe/0x24e0
> [ 48.155322] ? __do_sys_finit_module+0xbf/0xe0
> [ 48.155381] __do_sys_finit_module+0xbf/0xe0
> [ 48.155441] do_syscall_64+0x3d/0x130
> [ 48.156841] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 48.158074] RIP: 0033:0x7fba3b4bc2a9
> [ 48.158707] Code: Bad RIP value.
> [ 48.158990] RSP: 002b:00007ffe1da3a6d8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [ 48.159259] RAX: ffffffffffffffda RBX: 000055ca6922c470 RCX: 00007fba3b4bc2a9
> [ 48.159566] RDX: 0000000000000000 RSI: 00007fba3b3c0cad RDI: 0000000000000010
> [ 48.159842] RBP: 00007fba3b3c0cad R08: 0000000000000000 R09: 0000000000000000
> [ 48.160117] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
> [ 48.160412] R13: 000055ca6922f940 R14: 0000000000020000 R15: 000055ca6922c470
>
> I have filed this to bugzilla and more detail:
> https://bugzilla.kernel.org/show_bug.cgi?id=206653
>
> Any comment will be appreciated.

You are using an outdated kernel, 5.4.0. Please make sure that you can
reproduce the issue with mainline, or at least with the longterm
5.4.x.

Ajay, based on the backtrace, the issue seems to be starting from your
I2C driver. Please take a look at this.


thanks,

--
heikki