Re: [PATCH net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns()
From: Roger Quadros
Date: Thu Jan 16 2025 - 06:48:23 EST
On 16/01/2025 07:15, Siddharth Vadapalli wrote:
> On Wed, Jan 15, 2025 at 06:38:57PM +0200, Roger Quadros wrote:
>> Siddharth,
>>
>> On 15/01/2025 17:49, Roger Quadros wrote:
>>> Hi Siddharth,
>>>
>>> On 15/01/2025 12:38, Siddharth Vadapalli wrote:
>>>> On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote:
>>>>> Hi Siddharth,
>>>>>
>>>>> On 15/01/2025 07:18, Siddharth Vadapalli wrote:
>>>>>> On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote:
>>>>>>
>>>>>> Hello Roger,
>>>>>>
>>>>>>> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns
>>>>>>
>>>>>> You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to
>>>>>> assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows:
>>>>>
>>>>> Yes I meant tx instead of rx.
>>>>>
>>>>>>
>>>>>> tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn);
>>>>>>
>>>>>> Additionally, following the above section we have:
>>>>>>
>>>>>> if (tx_chn->irq < 0) {
>>>>>> dev_err(dev, "Failed to get tx dma irq %d\n",
>>>>>> tx_chn->irq);
>>>>>> ret = tx_chn->irq;
>>>>>> goto err;
>>>>>> }
>>>>>>
>>>>>> Could you please provide details on the code-path which will lead to a
>>>>>> negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"?
>>>>>>
>>>>>> There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely:
>>>>>> 1. am65_cpsw_nuss_update_tx_rx_chns()
>>>>>> 2. am65_cpsw_nuss_suspend()
>>>>>> Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only
>>>>>> in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it
>>>>>> appears to me that "tx_chn->irq" will never be negative within
>>>>>> am65_cpsw_nuss_remove_tx_chns()
>>>>>>
>>>>>> Please let me know if I have overlooked something.
>>>>>
>>>>> The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called
>>>>> repeatedly (by user changing number of TX queues) even if previous call
>>>>> to am65_cpsw_nuss_init_tx_chns() failed.
>>>>
>>>> Thank you for clarifying. So the issue/bug was discovered since the
>>>> implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag
>>>> misled me. Maybe the "Fixes" tag should be updated? Though we should
>>>> code to future-proof it as done in this patch, the "Fixes" tag pointing
>>>> to the very first commit of the driver might not be accurate as the
>>>> code-path associated with the bug cannot be exercised at that commit.
>>>
>>> Fair enough. I'll change the Fixes commit.
>>
>> Now that I check the code again, am65_cpsw_nuss_remove_tx_chns(),
>> am65_cpsw_nuss_update_tx_chns() and am65_cpsw_nuss_init_tx_chns()
>> were all introduced in the Fixes commit I stated.
>>
>> Could you please share why you thought it is not accurate?
>
> Though the functions were introduced in the Fixes commit that you have
> mentioned in the commit message, the check for "tx_chn->irq" being
> strictly positive as implemented in this patch, is not required until
> the commit which added am65_cpsw_nuss_update_tx_rx_chns(). The reason
> I say so is that a negative value for "tx_chn->irq" within
> am65_cpsw_nuss_remove_tx_chns() requires am65_cpsw_nuss_init_tx_chns()
> to partially fail *followed by* invocation of
> am65_cpsw_nuss_remove_tx_chns(). That isn't possible in the blamed
> commit which introduced them, since the driver probe fails when
> am65_cpsw_nuss_init_tx_chns() fails. The code path:
>
> am65_cpsw_nuss_init_tx_chns() => Partially fails / Fails
> am65_cpsw_nuss_remove_tx_chns() => Invoked later on
>
> isn't possible in the blamed commit.
But, am65_cpsw_nuss_update_tx_chns() and am65_cpsw_set_channels() was
introduced in the blamed commit and the test case I shared to
test .set_channels with different channel counts can still
fail with warning if am65_cpsw_nuss_init_tx_chns() partially fails.
>
> Regards,
> Siddharth.
--
cheers,
-roger