RE: Regression v5.12-rc3: net: stmmac: re-init rx buffers when mac resume back

From: Joakim Zhang
Date: Wed Mar 31 2021 - 07:41:54 EST



> -----Original Message-----
> From: Jon Hunter <jonathanh@xxxxxxxxxx>
> Sent: 2021年3月31日 19:29
> To: Joakim Zhang <qiangqing.zhang@xxxxxxx>; Giuseppe Cavallaro
> <peppe.cavallaro@xxxxxx>; Alexandre Torgue <alexandre.torgue@xxxxxx>;
> Jose Abreu <joabreu@xxxxxxxxxxxx>
> Cc: netdev@xxxxxxxxxxxxxxx; Linux Kernel Mailing List
> <linux-kernel@xxxxxxxxxxxxxxx>; linux-tegra <linux-tegra@xxxxxxxxxxxxxxx>;
> Jakub Kicinski <kuba@xxxxxxxxxx>
> Subject: Re: Regression v5.12-rc3: net: stmmac: re-init rx buffers when mac
> resume back
>
>
> On 31/03/2021 12:10, Joakim Zhang wrote:
>
> ...
>
> >>>>>>>> You mean one of your boards? Does other boards with STMMAC can
> >>>>>>>> work
> >>>>>>> fine?
> >>>>>>>
> >>>>>>> We have two devices with the STMMAC and one works OK and the
> >>>>>>> other
> >>>>> fails.
> >>>>>>> They are different generation of device and so there could be
> >>>>>>> some architectural differences which is causing this to only be
> >>>>>>> seen on one
> >>> device.
> >>>>>> It's really strange, but I also don't know what architectural
> >>>>>> differences could
> >>>>> affect this. Sorry.
> >>>
> >>>
> >>> I realised that for the board which fails after this change is made,
> >>> it has the IOMMU enabled. The other board does not at the moment
> >>> (although work is in progress to enable). If I add
> >>> 'iommu.passthrough=1' to cmdline for the failing board, then it
> >>> works again. So in my case, the problem is linked to the IOMMU being
> enabled.
> >>>
> >>> Does you platform enable the IOMMU?
> >>
> >> Hi Jon,
> >>
> >> There is no IOMMU hardware available on our boards. But why IOMMU
> >> would affect it during suspend/resume, and no problem in normal mode?
> >
> > One more add, I saw drivers/iommu/tegra-gart.c(not sure if is this) support
> suspend/resume, is it possible iommu resume back after stmmac?
>
>
> This board is the tegra186-p2771-0000 (Jetson TX2) and uses the
> arm,mmu-500 and not the above driver.

OK.

> In answer to your question, resuming from suspend does work on this board
> without your change. We have been testing suspend/resume now on this board
> since Linux v5.8 and so we have the ability to bisect such regressions. So it is
> clear to me that this is the change that caused this, but I am not sure why.

Yes, I know this issue is regression caused by my patch. I just want to analyze the potential reasons. Due to the code change only related to the page recycle and reallocate.
So I guess if this page operate need IOMMU works when IOMMU is enabled. Could you help check if IOMMU driver resume before STMMAC? Our common desire is to find the root cause, right?

Best Regards,
Joakim Zhang
> Thanks
> Jon
>
> --
> nvpublic