Re: [PATCH] linux/export: fix reference to exported functions for parisc64

From: Damien Le Moal
Date: Wed Sep 13 2023 - 19:46:00 EST


On 9/14/23 06:22, John David Anglin wrote:
> On 2023-09-13 1:58 p.m., John David Anglin wrote:
>> On 2023-09-12 5:53 p.m., John David Anglin wrote:
>>> On 2023-09-10 5:30 p.m., John David Anglin wrote:
>>>> Hi Masahiro,
>>>>
>>>> The attached change fixed boot at ddb5cdbafaaa 😁
>>>>
>>>> However, v6.5.x boot is still broken:
>>>>
>>>> Run /init as init process
>>>> process '/usr/bin/sh' started with executable stack
>>>> Loading, please wait...
>>>> Starting systemd-udevd version 254.1-3
>>>> e1000 alternatives: applied 0 out of 569 patches
>>>> e1000: Intel(R) PRO/1000 Network Driver
>>>> e1000: Copyright (c) 1999-2006 Intel Corporation.
>>>> scsi_mod alternatives: applied 0 out of 7 patches
>>>> SCSI subsystem initialized
>>>> usbcore alternatives: applied 0 out of 18 patches
>>>> usbcore: registered new interface driver usbfs
>>>> libata alternatives: applied 0 out of 3 patches
>>>> usbcore: registered new interface driver hub
>>>> usbcore: registered new device driver usb
>>>> mptbase alternatives: applied 0 out of 73 patches
>>>> ehci_hcd alternatives: applied 0 out of 114 patches
>>>> sata_sil24 alternatives: applied 0 out of 56 patches
>>>> Fusion MPT base driver 3.04.20
>>>> Copyright (c) 1999-2008 LSI Corporation
>>>> sata_sil24 0000:00:01.0: Applying completion IRQ loss on PCI-X errata fix
>>>> scsi host0: sata_sil24
>>>> scsi host1: sata_sil24
>>>> pata_sil680 0000:60:02.0: sil680: 133MHz clock.
>>>> scsi host2: sata_sil24
>>>> ehci_pci alternatives: applied 0 out of 2 patches
>>>> ohci_hcd alternatives: applied 0 out of 144 patches
>>>> ehci-pci 0000:60:01.2: EHCI Host Controller
>>>> scsi host3: pata_sil680
>>>> ehci-pci 0000:60:01.2: new USB bus registered, assigned bus number 1
>>>> scsi host4: sata_sil24
>>>> ata1: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80080000 ir6
>>>> ata2: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80082000 ir6
>>>> ata3: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80084000 ir6
>>>> ata4: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80086000 ir6
>>>> e1000 0000:60:03.0 eth0: (PCI:33MHz:32-bit) 00:11:0a:31:8a:77
>>>> ehci-pci 0000:60:01.2: irq 71, io mem 0xffffffffb00a1000
>>>> scsi host5: pata_sil680
>>>> ata5: PATA max UDMA/133 cmd 0x26058 ctl 0x26064 bmdma 0x26040 irq 72
>>>> ata6: PATA max UDMA/133 cmd 0x26050 ctl 0x26060 bmdma 0x26048 irq 72
>>>> e1000 0000:60:03.0 eth0: Intel(R) PRO/1000 Network Connection
>>>> ehci-pci 0000:60:01.2: USB 2.0 started, EHCI 0.95
>>>> usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.05
>>>> usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
>>>> usb usb1: Product: EHCI Host Controller
>>>> usb usb1: Manufacturer: Linux 6.5.2-dirty ehci_hcd
>>>> usb usb1: SerialNumber: 0000:60:01.2
>>>> hub 1-0:1.0: USB hub found
>>>> hub 1-0:1.0: 5 ports detected
>>>> ata1: SATA link down (SStatus 0 SControl 0)
>>>> ata2: SATA link down (SStatus 0 SControl 0)
>>>> ata3: SATA link down (SStatus 0 SControl 0)
>>>> ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>>>> ata4.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
>>>> ata4.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>>> ata4.00: configured for UDMA/100
>>>> scsi 4:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
>>>> ata6.00: ATAPI: HL-DT-STDVD+-RW GSA-H21L, 1.04, max UDMA/44
>>>> scsi 5:0:0:0: CD-ROM            HL-DT-ST DVD+-RW GSA-H21L 1.04 PQ: 0 ANSI: 5
>>>> random: crng init done
>>>> Timed out for waiting the udev queue being empty.
>>>> Begin: Loading essential drivers ... done.
>>>> Begin: Running /scripts/init-premount ... done.
>>>> Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
>>>> Begin: Running /scripts/local-premount ... done.
>>>> Timed out for waiting the udev queue being empty.
>>>> Begin: Waiting for root file system ... Begin: Running /scripts/local-block ....
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> done.
>>>> Gave up waiting for root file system device.  Common problems:
>>>>  - Boot args (cat /proc/cmdline)
>>>>    - Check rootdelay= (did the system wait long enough?)
>>>>  - Missing modules (cat /proc/modules; ls /dev)
>>>> ALERT!  LABEL=ROOT does not exist.  Dropping to a shell!
>>>> Rebooting automatically due to panic= boot argument
>>>>
>>>> I'll see if I can find the commit that breaks 6.5.
>>> I've traced this to the following merge commit:
>>>
>>> dave@atlas:~/linux/linux$ git bisect good
>>> ca7ce08d6a063e0ccb91dc57f9bc213120d0d1a7 is the first bad commit
>>> commit ca7ce08d6a063e0ccb91dc57f9bc213120d0d1a7
>>> Merge: 1546cd4bfda4 af92c02fb209
>>> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>> Date:   Fri Jun 30 11:57:07 2023 -0700
>>>
>>>     Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>>
>>>     Pull SCSI updates from James Bottomley:
>>>      "Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
>>>       lpfc, qla2xxx).
>>>
>>>       We have a couple of major core changes impacting other systems:
>>>
>>>        - Command Duration Limits, which spills into block and ATA
>>>
>>>        - block level Persistent Reservation Operations, which touches block,
>>>          nvme, target and dm
>>>
>>>       Both of these are added with merge commits containing a cover letter
>>>       explaining what's going on"
>>>
>>>     * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
>>>       scsi: core: Improve warning message in scsi_device_block()
>>>       scsi: core: Replace scsi_target_block() with scsi_block_targets()
>>>       scsi: core: Don't wait for quiesce in scsi_device_block()
>>>       scsi: core: Don't wait for quiesce in scsi_stop_queue()
>>>       scsi: core: Merge scsi_internal_device_block() and device_block()
>>>       scsi: sg: Increase number of devices
>>>       scsi: bsg: Increase number of devices
>>>       scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
>>>       scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
>>>       scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
>>>       scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
>>>       scsi: ufs: ufs-qcom: Switch to the new ICE API
>>>       scsi: ufs: dt-bindings: qcom: Add ICE phandle
>>>       scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
>>>       scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
>>>       scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
>>>       scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
>>>       scsi: ufs: core: Remove dedicated hwq for dev command
>>>       scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
>>>       scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
>>>       ...
>>>
>>> dave@atlas:~/linux/linux$ lspci
>>> 00:01.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02)
>>> 40:01.0 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>>> 40:01.1 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>>> 60:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 41)
>>> 60:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 41)
>>> 60:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 02)
>>> 60:02.0 IDE interface: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
>>> 60:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
>> This was introduced by the following commit:
>>
>> dave@atlas:~/linux/linux$ git bisect good
>> 624885209f31eb9985bf51abe204ecbffe2fdeea is the first bad commit
>> commit 624885209f31eb9985bf51abe204ecbffe2fdeea
>> Author: Damien Le Moal <dlemoal@xxxxxxxxxx>
>> Date:   Thu May 11 03:13:41 2023 +0200
>>
>>     scsi: core: Detect support for command duration limits
>>
>>     Introduce the function scsi_cdl_check() to detect if a device supports
>>     command duration limits (CDL). Support for the READ 16, WRITE 16, READ 32
>>     and WRITE 32 commands are checked using the function scsi_report_opcode()
>>     to probe the rwcdlp and cdlp bits as they indicate the mode page defining
>>     the command duration limits descriptors that apply to the command being
>>     tested.
>>
>>     If any of these commands support CDL, the field cdl_supported of struct
>>     scsi_device is set to 1 to indicate that the device supports CDL.
>>
>>     Support for CDL for a device is advertizes through sysfs using the new
>>     cdl_supported device attribute. This attribute value is 1 for a device
>>     supporting CDL and 0 otherwise.
>>
>>     Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
>>     Reviewed-by: Hannes Reinecke <hare@xxxxxxx>
>>     Co-developed-by: Niklas Cassel <niklas.cassel@xxxxxxx>
>>     Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
>>     Link: https://lore.kernel.org/r/20230511011356.227789-9-nks@xxxxxxxxxxx
>>     Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
>>
>>  Documentation/ABI/testing/sysfs-block-device |  9 ++++
>>  drivers/scsi/scsi.c                          | 81 ++++++++++++++++++++++++++++
>>  drivers/scsi/scsi_scan.c                     |  3 ++
>>  drivers/scsi/scsi_sysfs.c                    |  2 +
>>  include/scsi/scsi_device.h                   |  3 ++
>>  5 files changed, 98 insertions(+)
>>
>> Sometimes I see when booting a bad commit:
>> [...]
>> Begin: Running /scripts/local-block ... done.
>> Begin: Running /scripts/local-block ... done.
>> Begin: Running /scripts/local-block ... done.
>> done.
>> Gave up waiting for root file system device.  Common problems:
>>  - Boot args (cat /proc/cmdline)
>>    - Check rootdelay= (did the system wait long enough?)
>>  - Missing modules (cat /proc/modules; ls /dev)
>> ALERT!  LABEL=ROOT does not exist.  Dropping to a shell!
>> Rebooting automatically due to panic= boot argument
>> ata4: SATA link down (SStatus 0 SControl 0)
>> ata5: SATA link down (SStatus 0 SControl 0)
>> ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>> ata6.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
>> ata6.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> ata6.00: configured for UDMA/100
>> scsi 5:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
>
> System boots master at e56b2b605799 if I disable CDL:
>
> dave@atlas:~/linux/linux$ git diff drivers/scsi/scsi.c
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index d0911bc28663..dc3a283ebd75 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -578,6 +578,8 @@ static bool scsi_cdl_check_cmd(struct scsi_device *sdev, u8 opcode, u16 sa,
>         int ret;
>         u8 cdlp;
>
> +       return false;
> +
>         /* Check operation code */
>         ret = scsi_report_opcode(sdev, buf, SCSI_CDL_CHECK_BUF_LEN, opcode, sa);
>         if (ret <= 0)

It is weird that this solves anything... the MAINTENANCE_IN command issued by
scsi_report_opcode() ends up being emulated in libata with
ata_scsiop_maint_in(). There are no actual commands issued to the drive, so
nothing that could actually fail/cause issues. By the time this is issued, the
ATA drive is also fully probed...

Or is the drive connected to the Broadcom HBA you have ? In that case, libata is
not used and the HBA FW SAT (scsi-ata-translation) is likely to blame.

Could you send a full dmesg output for a clean boot and for a failed one so that
I can compare ?

--
Damien Le Moal
Western Digital Research