Re: [RFC 7/7] gpu: nova-core: load the scrubber ucode when vGPU support is enabled
From: Joel Fernandes
Date: Wed Dec 10 2025 - 20:34:11 EST
> On Dec 9, 2025, at 11:05 PM, Zhi Wang <zhiw@xxxxxxxxxx> wrote:
> [..]
>>> +
>>> + dev_dbg!(
>>> + pdev.as_ref(),
>>> + "SEC2 MBOX0: {:#x}, MBOX1{:#x}\n",
>>> + mbox0,
>>> + mbox1
>>> + );
>>> +
>>> + if
>>> !regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed()
>>> {
>>> + return Err(ETIMEDOUT);
>>
>> So under which situation do you get to this point
>> (!scrubber_completed) ? Basically I am not sure if ETIMEDOUT is the
>> right error to return here, because boot() already returns ETIMEDOUT
>> by waiting for the halt.
>>
>> If you still want return ETIMEDOUT here, then it sounds like you're
>> waiting for scrubbing beyond the waiting already done by boot(). If
>> so, then shouldn't you need to use read_poll_timeout() here?
>>
>> perhaps something like:
>>
>> read_poll_timeout(
>> ||
>> Ok(regs::NV_PGC6_BSI_SECURE_SCRATCH_15::read(bar).scrubber_completed()),
>> |val: &bool| *val, Delta::from_millis(10),
>> Delta::from_secs(5),
>> )?;
>>
>
> This is the identical implementation to OpenRM [1]. According to that
> parts of code, I think the scrubber runs in the binary booting process.
> When it signals the firmware booting successfully, the scrubbing should
> be done. Let me change to another errno.
>
> [1]https://github.com/NVIDIA/open-gpu-kernel-modules/blob/a5bfb10e75a4046c5d991c65f49b5d29151e68cf/src/nvidia/src/kernel/gpu/gsp/arch/ada/kernel_gsp_ad102.c#L49
Sure, it was just misleading in the patch that we’re returning a timeout error, when the error is something else (like scrubber failed). Thanks for correcting it.
- Joel