Re: Kirkwood PCI Express and bridges

From: Chris Packham
Date: Mon Jun 24 2019 - 00:08:26 EST


Hi Thomas,

On 21/06/19 6:17 PM, Thomas Petazzoni wrote:
> Hello Chris,
>
> On Fri, 21 Jun 2019 04:03:27 +0000
> Chris Packham <Chris.Packham@xxxxxxxxxxxxxxxxxxx> wrote:
>
>> I'm in the process of updating the kernel version used on our products
>> from 4.4 -> 5.1.
>>
>> We have one product that uses a Kirkwood CPU, IDT PCI bridge and Marvell
>> Switch ASIC. The Switch ASIC presents as multiple PCI devices.
>>
>> The hardware setup looks like this
>> __________
>> [ Kirkwood ] --- [ IDT 5T5 ] ---+--- | |
>> +--- | Switch |
>> +--- | |
>> +--- |__________|
>>
>> On the 4.4 based kernel things are fine
>>
>> [root@awplus flash]# lspci -t
>> -[0000:00]---01.0-[01-06]----00.0-[02-06]--+-02.0-[03]----00.0
>> +-03.0-[04]----00.0
>> +-04.0-[05]----00.0
>> \-05.0-[06]----00.0
>>
>> But on the 5.1 based kernel things get a little weird
>>
>> [root@awplus flash]# lspci -t
>> -[0000:00]---01.0-[01-06]--+-00.0-[02-06]--
>> +-01.0
>> +-02.0-[02-06]--
>> +-03.0-[02-06]--
>> +-04.0-[02-06]--
>> +-05.0-[02-06]--
>> +-06.0-[02-06]--
>> +-07.0-[02-06]--
>> +-08.0-[02-06]--
>> +-09.0-[02-06]--
>> +-0a.0-[02-06]--
>> +-0b.0-[02-06]--
>> +-0c.0-[02-06]--
>> +-0d.0-[02-06]--
>> +-0e.0-[02-06]--
>> +-0f.0-[02-06]--
>> +-10.0-[02-06]--
>> +-11.0-[02-06]--
>> +-12.0-[02-06]--
>> +-13.0-[02-06]--
>> +-14.0-[02-06]--
>> +-15.0-[02-06]--
>> +-16.0-[02-06]--
>> +-17.0-[02-06]--
>> +-18.0-[02-06]--
>> +-19.0-[02-06]--
>> +-1a.0-[02-06]--
>> +-1b.0-[02-06]--
>> +-1c.0-[02-06]--
>> +-1d.0-[02-06]--
>> +-1e.0-[02-06]--
>> \-1f.0-[02-06]--+-02.0-[03]----00.0
>> +-03.0-[04]----00.0
>> +-04.0-[05]----00.0
>> \-05.0-[06]----00.0
>>
>>
>> I'll start bisecting to see where things started going wrong. I just
>> wondered if this rings any bells for anyone.
>
> I am almost sure that the culprit is
> 1f08673eef1236f7d02d93fcf596bb8531ef0d12 ("PCI: mvebu: Convert to PCI
> emulated bridge config space").

The problem seems to pre-date this commit. I've gone back as far as 4.18
and the problem still exists (in fact there are more duplicate devices).
I'll keep going back (unfortunately due to out platform being out of
tree it's not a simple bisect).

> I still think it makes sense to share the bridge emulation code between
> the mvebu and aardvark drivers, but this sharing has required making
> the code very different, with lots of subtle differences in behavior in
> how registers are emulated.

Agreed. Bugs love to hide in duplicated code.

I will admit to being ignorant about the need for an emulated bridge. I
know it has something to do with the type of transaction used for the
downstream devices. I also know that these systems won't work without an
emulated bridge.

> Unfortunately, I don't have access to one of these complicated PCI
> setup with a HW switch on the way, so I couldn't test this kind of
> setups.
>
> Do you mind helping with figuring out what the issues are ? That would
> be really nice.

No problem. As I said I'll keep going to find a point where behaviour
turns bad for me. I suspect we might find other problems along the way.