Re: Regression from dcadfd7f7c74ef9ee415e072a19bdf6c085159eb

From: Takashi Sakamoto
Date: Sun Dec 03 2023 - 08:00:17 EST


Hi Mario,

Thanks for the advices.

I note that In my experiments I use Ubuntu 23.04 amd64 (v6.2 kernel) with
backported FireWire stack[1]. Except for the stack, the kernel and software
packages can be retrieved from repositories of Ubuntu project.

On Tue, Nov 28, 2023 at 12:09:41AM -0600, Mario Limonciello wrote:
> On 11/27/2023 23:24, Takashi Sakamoto wrote:
> > Hi Mario
> >
> > Following up on our last conversation, I purchase some hardware to
> > attempt to retrieve outputs from serial port. Finally, I bought another
> > mother board in used market which provides serial port from Super I/O
> > chip (ASUS TUF Gaming X570-Plus). However, I have retrieved no helpful
> > outputs yet when encountering the system reboot.
>
> Did you up the loglevel to 8 to make sure you'll get all kernel output on
> the serial port, not just errors?

Even if giving either 'debug' cmdline option or incrementing console
loglevel via syctl, I receive no useful output from console when loading
the module at or after booting up.

```
$ sysctl kernel.printk
kernel.printk = 7 7 1 7
```

I tried at several difference cases; enabling/disabling IOMMU,
enabling/disabling SVM in motherboard level. But nothing effective.

> > As you mentioned, I check whether PCIe AER is enabled or not in the
> > running kernel (Ubuntu 23.04 linux-image-6.2.0-37-generic). It is
> > certainly enabled, however I can see nothing in the output as I noted.
> >
> > I experienced extra troubles relevant to AMD Ryzen machine and the issued
> > PCIe device:
> >
> > * ASRock X570 Phantom Gaming 4 with AMD Ryzen 5 3600X does not detect
> > the card. We can see no corresponding entry in lspci.
> > * After associating the card to vfio-pci, lspci command can reboot the
> > system even if firewire-ohci driver is not loaded. I can regenerate it
> > in both Gigabyte AX370-Gaming 5/ASUS TUF Gaming X570-plus with AMD
> > Ryzen 2400G.
>
> Rather than lspci, is it specifically config space access from sysfs? Does
> the output from the serial port change with IOMMU enabled vs disabled?

In lspci case, I can work with debugger and figure out that 'pread(2)' to
file descriptor for 'config' node in sysfs causes the unexpected system
reboot. Additionally I can regenerate it by hexdump(1) to the node:

```
$ lspci
...
04:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge [1b21:1080] (rev 03)
05:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller [1106:3044] (rev 80)
...
$ hexdump -C /sys/bus/pci/devices/0000\:05\:00.0/config
00000000 06 11 44 30 80 00 10 02 80 10 00 0c 10 20 00 00 |..D0......... ..|
00000010 00 00 90 fc 01 d0 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 06 11 44 30 |..............D0|
00000030 00 00 00 00 50 00 00 00 00 00 00 00 ff 01 00 20 |....P.......... |
00000040

$ lsmod | grep firewire
(no output)

$ sudo -i
# modprobe vfio-pci
# echo 1106 3044 > /sys/bus/pci/drivers/vfio-pci/new_id
# exit

$ hexdump -C /sys/bus/pci/devices/0000\:05\:00.0/config
(reboot)
```

I can suppress it when disabling IOMMU in motherboard. In this point, the
issue of lspci is a bit different from the issue of driver issue.

> > I'm plreased to see if you have extra ideas to get helpful output from
> > the system. But I guess that I should start finding some workaround to
> > avoid the issued access to register instead of investigating the reboot
> > mechanism, sigh...
> >
> > Anyway, thanks for your help. >
>
> Can you check FCH::PM::S5_RESET_STATUS on next boot after failure has
> occurred? It is available at MMIO FED80300 or through indirect IO access at
> 0xC0.
>
> If MMIO doesn't work, double check FCH::PM_ISACONTROL bit 1 (described on
> page 296) to confirm if your system enables it.
>
> The meanings of the different bits can be found in a recent PPR:
> https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/55901_B1_pub_053.zip
>
> Indirect IO is described on PDF page 294.
>
> This will at least give us a hint what's going on in this case.

I'll try the above in this week. Thanks.


[1] https://github.com/takaswie/linux-firewire-dkms/

Takashi Sakamoto