RE: [EXT] Re: [PATCH net v1] octeon_ep: initialise control mbox tasks before using APIs

From: Shinas Rasheed
Date: Tue Dec 05 2023 - 01:50:29 EST




> -----Original Message-----
> > On Sat, Dec 2, 2023 at 4:08 PM Shinas Rasheed <srasheed@xxxxxxxxxxx>
> wrote:
> > > Do INIT_WORK for the various workqueue tasks before the first
> > > invocation of any control net APIs. Since octep_ctrl_net_get_info
> > > was called before the control net receive work task was even
> > > initialised, the function call wasn't returning actual firmware
> > > info queried from Octeon.
> >
> > It might be more accurate to say that octep_ctrl_net_get_info depends
> > on the processing of OEI events. This happens in intr_poll_task.
> > That's why intr_poll_task needs to be queued earlier.


Intr_poll_task is queued only when the interface is down and the PF cannot catch IRQs as they have been torn down.
Elsewise, OEI events will trigger the OEI IRQ and consequently its handler. Nevertheless, your point is correct in that it
needs to be queued earlier, but I think subsequently since it calls the control mbox task, that is more relevant and necessary as if it
is not initialized, it cannot be scheduled even if OEI interrupts have been caught.

> > Did octep_send_mbox_req previously always fail with EAGAIN after
> ^^^^^^^^^^^^^^^^^^^^^
> I meant octep_ctrl_net_get_info here.
>
> > running into the 500 ms timeout in octep_send_mbox_req?

Yes it did, but as it was silent (note that we're not checking any error value), it didn't stop operation. I think I might have to update this patch
to catch the error values as well (This was a relic from the original code which spawned an extra thread to setup device and hence couldn't give back
an error value. That implementation was discouraged and we setup things at probe itself in the upstreamed code and can check error values)

> > Apropos octep_send_mbox_req... I think it has a race. "d" is put on
> > the ctrl_req_wait_list after sending the request to the hardware. If
> > the response arrives quickly, "d" might not yet be on the list when
> > process_mbox_resp looks for it.
> > Also, what protects ctrl_req_wait_list from concurrent access?

Such a race condition is, I also think, valid, but is not currently occurring as response, after due processing from Octeon application,
wouldn't arrive that quickly. Regarding concurrent access, there is currently no protection for ctrl_req_wait_list. Concurrent access here,
can only happen if either two requests manage to get hold of the ctrl_req_wait_list or a request and a response manages to get hold of the
ctrl_req_wait_list (the case you stated above).

In the first case, since locks are implemented atop the control mbox itself, requests would have to in effect wait for their chance to
queue their wait data "d" to ctrl_req_wait_list, avoiding concurrent access.

The second case is valid, but as I stated, wouldn't happen practically. But I suppose we do have to handle all theoretical cases and perhaps
locking can be done. I suppose a separate patch for it might be better.