Re: [PATCH net-next 1/6] net: ipa: don't suspend/resume modem if not up

From: Jakub Kicinski
Date: Fri Aug 06 2021 - 09:00:07 EST


On Fri, 6 Aug 2021 06:39:46 -0500 Alex Elder wrote:
> On 8/5/21 8:26 PM, Jakub Kicinski wrote:
> > On Wed, 4 Aug 2021 10:36:21 -0500 Alex Elder wrote:
> >> The modem network device is set up by ipa_modem_start(). But its
> >> TX queue is not actually started and endpoints enabled until it is
> >> opened.
> >>
> >> So avoid stopping the modem network device TX queue and disabling
> >> endpoints on suspend or stop unless the netdev is marked UP. And
> >> skip attempting to resume unless it is UP.
> >>
> >> Signed-off-by: Alex Elder <elder@xxxxxxxxxx>
> >
> > You said in the cover letter that in practice this fix doesn't matter.
>
> I don't think we've seen this problem with system suspend, but
> with runtime suspend we could get a forced suspend request at
> any time (and frequently), so if there is a problem, it will be
> much more likely to occur.
>
> For suspend, I don't think it's actually a "problem". Disabling
> the TX queue if it wasn't open is harmless--it just sets the
> DRV_XOFF bit in the TX queue state field. And we have a
> separate "enabled endpoints" mask that prevents stopping or
> suspending the endpoint if it wasn't opened.
>
> But for resume, waking the queue schedules it. I'm not sure
> what exactly ensues in that case, but it's not correct if the
> network device hasn't been opened. For endpoints, again, they
> won't be resumed if they weren't enabled, so that part's OK.
>
> > It seems trivial to test so perhaps it doesn't and we should leave the
> > code be? Looking at dev->flags without holding rtnl_lock() seems
> > suspicious, drivers commonly put the relevant portion of suspend/resume
> > routines under rtnl_lock()/rtnl_unlock() (although to be completely
>
> I don't use rtnl_lock()/rtnl_unlock() *anywhere* in the driver.
> It has no netlink interface (yet), and therefore I didn't even
> think about using rtnl_lock(). Do I need it?

Runtime PM interactions with rtnl_lock get really tricky, if there are
callers which will wake the device up while holding rtnl then taking
rtnl in .resume will cause an obvious deadlock, right?

I'm starting to feel like driver's RPM-related code has to be under it's
own lock, and interrogating higher layer's (e.g. network stack's) state
from RPM code should be avoided...

Long story short I don't think we have a good handle on this,
I certainly don't so maybe let's leave your code be, for now.

> > frank IDK if it's actually possible for concurrent suspend +
> > open/close to happen).
>
> I think it isn't possible, but I'm less than 100% sure. I've
> been thinking a lot about exactly this sort of question lately...
>
> > Are there any callers of ipa_modem_stop() which don't hold rtnl_lock()?
>
> None of them take that lock. It is called in the driver ->remove
> callback, and is called during cleanup if the modem crashes.
>
> I think this fix is good, but as I said in the cover letter I'm
> not aware of ever having hit it to date.
>
> Thank you very much for your review and comments.