Re: [PATCH net-next RFC v4 01/15] devlink: Add reload action option to devlink reload command

From: Jiri Pirko
Date: Mon Sep 14 2020 - 08:02:40 EST


Mon, Sep 14, 2020 at 11:54:55AM CEST, vasundhara-v.volam@xxxxxxxxxxxx wrote:
>On Mon, Sep 14, 2020 at 3:02 PM Jiri Pirko <jiri@xxxxxxxxxxx> wrote:
>>
>> Mon, Sep 14, 2020 at 09:08:58AM CEST, vasundhara-v.volam@xxxxxxxxxxxx wrote:
>> >On Mon, Sep 14, 2020 at 11:39 AM Moshe Shemesh <moshe@xxxxxxxxxxxx> wrote:
>>
>> [...]
>>
>>
>> >> @@ -1126,15 +1126,24 @@ mlxsw_devlink_core_bus_device_reload_down(struct devlink *devlink,
>> >> }
>> >>
>> >> static int
>> >> -mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink,
>> >> - struct netlink_ext_ack *extack)
>> >> +mlxsw_devlink_core_bus_device_reload_up(struct devlink *devlink, enum devlink_reload_action action,
>> >> + struct netlink_ext_ack *extack,
>> >> + unsigned long *actions_performed)
>> >Sorry for repeating again, for fw_activate action on our device, all
>> >the driver entities undergo reset asynchronously once user initiates
>> >"devlink dev reload action fw_activate" and reload_up does not have
>> >much to do except reporting actions that will be/being performed.
>> >
>> >Once reset is complete, the health reporter will be notified using
>>
>> Hmm, how is the fw reset related to health reporter recovery? Recovery
>> happens after some error event. I don't believe it is wise to mix it.
>Our device has a fw_reset health reporter, which is updated on reset
>events and firmware activation is one among them. All non-fatal
>firmware reset events are reported on fw_reset health reporter.

Hmm, interesting. In that case, assuming this is fine, should we have
some standard in this. I mean, if the driver supports reset, should it
also define the "fw_reset" reporter to report such events?

Jakub, what is your take here?


>
>>
>> Instead, why don't you block in reload_up() until the reset is complete?
>
>Though user initiate "devlink dev reload" event on a single interface,
>all driver entities undergo reset and all entities recover
>independently. I don't think we can block the reload_up() on the
>interface(that user initiated the command), until whole reset is
>complete.

Why not? mlxsw reset takes up to like 10 seconds for example.


>>
>>
>> >devlink_health_reporter_recovery_done(). Returning from reload_up does
>> >not guarantee successful activation of firmware. Status of reset will
>> >be notified to the health reporter via
>> >devlink_health_reporter_state_update().
>> >
>> >I am just repeating this, so I want to know if I am on the same page.
>> >
>> >Thanks.
>>
>> [...]