Re: [Intel-wired-lan] [RFC iwl-net] e1000: Hold RTNL when e1000_down can be called
From: Jacob Keller
Date: Tue Oct 22 2024 - 17:15:52 EST
On 10/22/2024 2:12 PM, Joe Damato wrote:
> On Tue, Oct 22, 2024 at 01:00:47PM -0700, Joe Damato wrote:
>> On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote:
>>> e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
>>>
>>> There are a few paths for e1000_down to be called in e1000 where RTNL is
>>> not currently being held:
>>> - e1000_shutdown (pci shutdown)
>>> - e1000_suspend (power management)
>>> - e1000_reinit_locked (via e1000_reset_task delayed work)
>>>
>>> Hold RTNL in two places to fix this issue:
>>> - e1000_reset_task
>>> - __e1000_shutdown (which is called from both e1000_shutdown and
>>> e1000_suspend).
>>
>> It looks like there's one other spot I missed:
>>
>> e1000_io_error_detected (pci error handler) which should also hold
>> rtnl_lock:
>>
>> + if (netif_running(netdev)) {
>> + rtnl_lock();
>> e1000_down(adapter);
>> + rtnl_unlock();
>> + }
>>
>> I can send that update in the v2, but I'll wait to see if Intel has suggestions
>> on the below.
>>
>>> The other paths which call e1000_down seemingly hold RTNL and are OK:
>>> - e1000_close (ndo_stop)
>>> - e1000_change_mtu (ndo_change_mtu)
>>>
>>> I'm submitting this is as an RFC because:
>>> - the e1000_reinit_locked issue appears very similar to commit
>>> 21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
>>> fixes a similar issue in e1000e
>>>
>>> however
>>>
>>> - adding rtnl to e1000_reinit_locked seemingly conflicts with an
>>> earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
>>> e1000_reset_task").
>>>
>>> Hopefully Intel can weigh in and shed some light on the correct way to
>>> go.
>
> Regarding the above locations where rtnl_lock may need to be held,
> comparing to other intel drivers:
>
> - e1000_reset_task: it appears that igc, igb, and e100e all hold
> rtnl_lock in their reset_task functions, so I think adding an
> rtnl_lock / rtnl_unlock to e1000_reset_task should be OK,
> despite the existence of commit b2f963bfaeba ("e1000: fix
> lockdep warning in e1000_reset_task").
>
> - e1000_io_error_detected:
> - e1000e temporarily obtains and drops rtnl in
> e1000e_pm_freeze
> - ixgbe holds rtnl in the same path (toward the bottom of
> ixgbe_io_error_detected)
> - igb does NOT hold rtnl in this path (as far as I can tell)
> - it was suggested in another thread to hold rtnl in this path
> for igc [1].
>
> Given that it will be added to igc and is held in this same
> path in e1000e and ixgbe, I think it is safe to add it for
> e1000, as well.
>
> - e1000_shutdown:
> - igb holds rtnl in the same path,
> - e1000e temporarily holds it in this path (via
> e1000e_pm_freeze)
> - ixgbe holds rtnl in the same path
>
> So based on the recommendation for igc [1], and the precedent set in
> the other Intel drivers in most cases (except igb and the io_error
> path), I think adding rtnl to all 3 locations described above is
> correct.
>
> Please let me know if you all agree. Thanks for reviewing this.
>
>
[1]:
https://lore.kernel.org/netdev/40242f59-139a-4b45-8949-1210039f881b@xxxxxxxxx/
I agree with this assessment.