Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

From: Guillaume Tucker
Date: Thu Feb 18 2021 - 15:41:49 EST


On 18/02/2021 10:35, Nicolin Chen wrote:
> Hi Guillaume,
>
> Thank you for the test results! And sorry for my belated reply.

No worries :)

> On Thu, Feb 11, 2021 at 03:50:05PM +0000, Guillaume Tucker wrote:
>>> On Sat, Feb 06, 2021 at 01:40:13PM +0000, Guillaume Tucker wrote:
>>>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>>>> and the commit-reverted version (passing), each applying this patch.
>>>>
>>>> Sure, I've run 3 jobs:
>>>>
>>>> * v5.11-rc6 as a reference, to see the original issue:
>>>> https://lava.collabora.co.uk/scheduler/job/3187848
>>>>
>>>> * + your debug patch:
>>>> https://lava.collabora.co.uk/scheduler/job/3187849
>>>>
>>>> * + the "breaking" commit reverted, passing the tests:
>>>> https://lava.collabora.co.uk/scheduler/job/3187851
>>>
>>> Thanks for the help!
>>>
>>> I am able to figure out what's probably wrong, yet not so sure
>>> about the best solution at this point.
>>>
>>> Would it be possible for you to run one more time with another
>>> debugging patch? I'd like to see the same logs as previous:
>>> 1. Vanilla kernel + debug patch
>>> 2. Vanilla kernel + Reverted + debug patch
>>
>> As it turns out, next-20210210 is passing all the tests again so
>> it looks like this got fixed in the meantime:
>>
>> https://lava.collabora.co.uk/scheduler/job/3210192
>
> I checked this passing log, however, found that the regression is
> still there though test passed, as the prints below aren't normal:
> tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
> EMEM address decode error (SMMU translation error [--S])
> tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
> Page fault (SMMU translation error [--S])

Ah yes sorry, there are other KernelCI checks for kernel errors
but that wasn't enabled in the bisection so I didn't notice them.

> I was trying to think of a simpler solution than a revert. However,
> given the fact that the callback sequence could change -- guessing
> likely a recent change in iommu core, I feel it safer to revert my
> previous change, not necessarily being a complete revert though.
>
> I attached my partial reverting change in this email. Would it be
> possible for you to run one more test for me to confirm it? It'd
> keep the tests passing while eliminating all error prints above.
>
> If the fix works, I'll re-send it to mail list by adding a commit
> message.

Sure, here's next-20210218 as a reference:

https://lava.collabora.co.uk/scheduler/job/3241236

and here with your patch applied on top of it:

https://lava.collabora.co.uk/scheduler/job/3241246

The git branch I've used where your patch is applied:

https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210218-nyan-big-drm-read/

The errors seem to have disappeared but I'll let you double check
that things are all back to a working state.

BTW: This thread is a good example of how having an "on-demand"
KernelCI service to let developers re-run tests with extra
patches would allow them to fix issues independently. We'll keep
that in mind for the future.

Best wishes,
Guillaume