Re: [PATCH net-next RFC 00/13] Add devlink reload level option

From: Moshe Shemesh
Date: Wed Aug 05 2020 - 02:32:18 EST



On 8/4/2020 1:13 PM, Vasundhara Volam wrote:
On Mon, Aug 3, 2020 at 7:23 PM Moshe Shemesh <moshe@xxxxxxxxxxxx> wrote:

On 8/3/2020 3:47 PM, Vasundhara Volam wrote:
On Mon, Aug 3, 2020 at 5:47 PM Moshe Shemesh <moshe@xxxxxxxxxxxx> wrote:
On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <jacob.e.keller@xxxxxxxxx> wrote:
On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <moshe@xxxxxxxxxxxx> wrote:
Introduce new option on devlink reload API to enable the user to select the
reload level required. Complete support for all levels in mlx5.
The following reload levels are supported:
driver: Driver entities re-instantiation only.
fw_reset: Firmware reset and driver entities re-instantiation.
The Name is a little confusing. I think it should be renamed to
fw_live_reset (in which both firmware and driver entities are
re-instantiated). For only fw_reset, the driver should not undergo
reset (it requires a driver reload for firmware to undergo reset).

So, I think the differentiation here is that "live_patch" doesn't reset
anything.
This seems similar to flashing the firmware and does not reset anything.
The live patch is activating fw change without reset.

It is not suitable for any fw change but fw gaps which don't require reset.

I can query the fw to check if the pending image change is suitable or
require fw reset.
Okay.
fw_live_patch: Firmware live patching only.
This level is not clear. Is this similar to flashing??

Also I have a basic query. The reload command is split into
reload_up/reload_down handlers (Please correct me if this behaviour is
changed with this patchset). What if the vendor specific driver does
not support up/down and needs only a single handler to fire a firmware
reset or firmware live reset command?
In the "reload_down" handler, they would trigger the appropriate reset,
and quiesce anything that needs to be done. Then on reload up, it would
restore and bring up anything quiesced in the first stage.
Yes, I got the "reload_down" and "reload_up". Similar to the device
"remove" and "re-probe" respectively.

But our requirement is a similar "ethtool reset" command, where
ethtool calls a single callback in driver and driver just sends a
firmware command for doing the reset. Once firmware receives the
command, it will initiate the reset of driver and firmware entities
asynchronously.
It is similar to mlx5 case here for fw_reset. The driver triggers the fw
command to reset and all PFs drivers gets events to handle and do
re-initialization. To fit it to the devlink reload_down and reload_up,
I wait for the event handler to complete and it stops at driver unload
to have the driver up by devlink reload_up. See patch 8 in this patchset.

Yes, I see reload_down is triggering the reset. In our driver, after
triggering the reset through a firmware command, reset is done in
another context as the driver initiates the reset only after receiving
an ASYNC event from the firmware.

Same here.

Probably, we have to use reload_down() to send firmware command to
trigger reset and do nothing in reload_up.
I had that in previous version, but its wrong to use devlink reload this
way, so I added wait with timeout for the event handling to complete
before unload_down function ends. See mlx5_fw_wait_fw_reset_done(). Also
the event handler stops before load back to have that done by devlink
reload_up.
But "devlink dev reload" will be invoked by the user only on a single
dev handler and all function drivers will be re-instantiated upon the
ASYNC event. reload_down and reload_up are invoked only the function
which the user invoked.

Take an example of a 2-port (PF0 and PF1) adapter on a single host and
with some VFs loaded on the device. User invokes "devlink dev reload"
on PF0, ASYNC event is received on 2 PFs and VFs for reset. All the
function drivers will be re-instantiated including PF0.

If we wait for some time in reload_down() of PF0 and then call load in
reload_up(), this code will be different from other function drivers.


I see your point here, but the user run devlink reload command on one PF, in this case of fw-reset it will influence other PFs, but that's a result of the fw-reset, the user if asked for params change or namespace change that was for this PF.

And returning from reload
does not mean that reset is complete as it is done in another context
and the driver notifies the health reporter once the reset is
complete. devlink framework may have to allow drivers to implement
reload_down only to look more clean or call reload_up only if the
driver notifies the devlink once reset is completed from another
context. Please suggest.