Re: [PATCH v3 2/9] gpu: nova-core: gsp: add NvStatus enum for RM control errors
From: John Hubbard
Date: Mon Apr 06 2026 - 15:02:38 EST
On 4/6/26 5:09 AM, Eliot Courtney wrote:
> On Mon Apr 6, 2026 at 5:05 AM JST, John Hubbard wrote:
>> On 3/25/26 5:13 AM, Eliot Courtney wrote:
...
>>> +impl From<NvStatus> for Result {
>>> + fn from(status: NvStatus) -> Self {
>>
>> This discards the NvStatus variant. The caller sees only a kernel errno.
>> We need to retain the rich RM error codes for debugging. For example,
>> a caller that gets Err(ENODEV) cannot tell whether RM returned
>> CardNotPresent, InvalidDevice, FabricManagerNotPresent, or
>> ReductionManagerNotAvailable. Something like:
>>
>> fn send(self, ...) -> Result<T::Reply, NvStatus>
>>
>> would let the caller log or match on the original status, and convert
>> to errno only when propagating upward.
>
> SGTM, let me push NvStatus handling a bit up the caller chain then.
> I was thinking it might be nice to not carry the openrm naming, and
> instead call it e.g. GspMsgRmStatus.
Yes.
>
> But, I don't think we should use it as is as the error variant of
> Result, because it contains 'Ok'. A better shape might be:
>
> GspMsgRmStatus {Ok, Warning(GspMsgRmWarning), Error(GspMsgRmError)}
>
> Another issue is that RmControl::send needs to also propagate the
> kernel::error::Error from calling the cmdq. So we would need a wrapping
> enum, e.g. enum GspRpcError {Transport(Error), Rm(GspMsgRmError)}. The
> (currently unchecked) rpc_result field that comes back from GSP messages
> can contain NV_STATUS values (as well as some other ones), so
> probably Cmdq should be returning this too.
>
> So if we are fine to just log warnings instead of surfacing them to the
> caller, we could do:
I think we are fine to "at most" log the warnings. (RM is too verbose
in many cases, so I'm hesitant to categorically log anything at all.
Case-by-case evaluation is probably called for there.)
>
> fn send(...) -> Result<T::Reply, GspRpcError>
>
> WDYT?
That all sounds good.
>
> If we go this direction, I might create another patch series for
> defining GspRpcError and adding support to Cmdq for returning that,
> based on the RPC header's `rpc_result`, since we need to pass that
> through here for RM control.
>
>>
>>> + match status {
>>> + NvStatus::Ok => Ok(()),
>> ...
>>> +
>>> + NvStatus::Timeout => Err(ETIMEDOUT),
>>> +
>>> + _ => Err(EIO),
>>
>> This will cause failures for NV_WARN_* items, which is a problem.
>> RM warnings should not be converted into errors.
>
> Do you reckon we will want to do anything other than just log the
> warning?
Tentatively, no, I don't see a need to return warnings to the caller,
unless we notice a warning that is somehow actionable (I don't recall
ever seeing such a thing, though, and it seems wrong anyway).
If we decide to send warnings back up the stack, that doesn't seem
like a big change to nova in the future anyway.
thanks,
--
John Hubbard