Re: [PATCH] linux/export: fix reference to exported functions for parisc64

From: Damien Le Moal
Date: Wed Sep 13 2023 - 21:15:50 EST


On 9/14/23 09:29, John David Anglin wrote:
> On 2023-09-13 7:45 p.m., Damien Le Moal wrote:
>> On 9/14/23 06:22, John David Anglin wrote:
>>> On 2023-09-13 1:58 p.m., John David Anglin wrote:
>>>> On 2023-09-12 5:53 p.m., John David Anglin wrote:
>>>>> On 2023-09-10 5:30 p.m., John David Anglin wrote:
>>>>>> Hi Masahiro,
>>>>>>
>>>>>> The attached change fixed boot at ddb5cdbafaaa 😁
>>>>>>
>>>>>> However, v6.5.x boot is still broken:
>>>>>>
>>>>>> Run /init as init process
>>>>>> process '/usr/bin/sh' started with executable stack
>>>>>> Loading, please wait...
>>>>>> Starting systemd-udevd version 254.1-3
>>>>>> e1000 alternatives: applied 0 out of 569 patches
>>>>>> e1000: Intel(R) PRO/1000 Network Driver
>>>>>> e1000: Copyright (c) 1999-2006 Intel Corporation.
>>>>>> scsi_mod alternatives: applied 0 out of 7 patches
>>>>>> SCSI subsystem initialized
>>>>>> usbcore alternatives: applied 0 out of 18 patches
>>>>>> usbcore: registered new interface driver usbfs
>>>>>> libata alternatives: applied 0 out of 3 patches
>>>>>> usbcore: registered new interface driver hub
>>>>>> usbcore: registered new device driver usb
>>>>>> mptbase alternatives: applied 0 out of 73 patches
>>>>>> ehci_hcd alternatives: applied 0 out of 114 patches
>>>>>> sata_sil24 alternatives: applied 0 out of 56 patches
>>>>>> Fusion MPT base driver 3.04.20
>>>>>> Copyright (c) 1999-2008 LSI Corporation
>>>>>> sata_sil24 0000:00:01.0: Applying completion IRQ loss on PCI-X errata fix
>>>>>> scsi host0: sata_sil24
>>>>>> scsi host1: sata_sil24
>>>>>> pata_sil680 0000:60:02.0: sil680: 133MHz clock.
>>>>>> scsi host2: sata_sil24
>>>>>> ehci_pci alternatives: applied 0 out of 2 patches
>>>>>> ohci_hcd alternatives: applied 0 out of 144 patches
>>>>>> ehci-pci 0000:60:01.2: EHCI Host Controller
>>>>>> scsi host3: pata_sil680
>>>>>> ehci-pci 0000:60:01.2: new USB bus registered, assigned bus number 1
>>>>>> scsi host4: sata_sil24
>>>>>> ata1: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80080000 ir6
>>>>>> ata2: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80082000 ir6
>>>>>> ata3: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80084000 ir6
>>>>>> ata4: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80086000 ir6
>>>>>> e1000 0000:60:03.0 eth0: (PCI:33MHz:32-bit) 00:11:0a:31:8a:77
>>>>>> ehci-pci 0000:60:01.2: irq 71, io mem 0xffffffffb00a1000
>>>>>> scsi host5: pata_sil680
>>>>>> ata5: PATA max UDMA/133 cmd 0x26058 ctl 0x26064 bmdma 0x26040 irq 72
>>>>>> ata6: PATA max UDMA/133 cmd 0x26050 ctl 0x26060 bmdma 0x26048 irq 72
>>>>>> e1000 0000:60:03.0 eth0: Intel(R) PRO/1000 Network Connection
>>>>>> ehci-pci 0000:60:01.2: USB 2.0 started, EHCI 0.95
>>>>>> usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.05
>>>>>> usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
>>>>>> usb usb1: Product: EHCI Host Controller
>>>>>> usb usb1: Manufacturer: Linux 6.5.2-dirty ehci_hcd
>>>>>> usb usb1: SerialNumber: 0000:60:01.2
>>>>>> hub 1-0:1.0: USB hub found
>>>>>> hub 1-0:1.0: 5 ports detected
>>>>>> ata1: SATA link down (SStatus 0 SControl 0)
>>>>>> ata2: SATA link down (SStatus 0 SControl 0)
>>>>>> ata3: SATA link down (SStatus 0 SControl 0)
>>>>>> ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>>>>>> ata4.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
>>>>>> ata4.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>>>>> ata4.00: configured for UDMA/100
>>>>>> scsi 4:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
>>>>>> ata6.00: ATAPI: HL-DT-STDVD+-RW GSA-H21L, 1.04, max UDMA/44
>>>>>> scsi 5:0:0:0: CD-ROM            HL-DT-ST DVD+-RW GSA-H21L 1.04 PQ: 0 ANSI: 5
>>>>>> random: crng init done
>>>>>> Timed out for waiting the udev queue being empty.
>>>>>> Begin: Loading essential drivers ... done.
>>>>>> Begin: Running /scripts/init-premount ... done.
>>>>>> Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
>>>>>> Begin: Running /scripts/local-premount ... done.
>>>>>> Timed out for waiting the udev queue being empty.
>>>>>> Begin: Waiting for root file system ... Begin: Running /scripts/local-block ....
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> done.
>>>>>> Gave up waiting for root file system device.  Common problems:
>>>>>>  - Boot args (cat /proc/cmdline)
>>>>>>    - Check rootdelay= (did the system wait long enough?)
>>>>>>  - Missing modules (cat /proc/modules; ls /dev)
>>>>>> ALERT!  LABEL=ROOT does not exist.  Dropping to a shell!
>>>>>> Rebooting automatically due to panic= boot argument
>>>>>>
>>>>>> I'll see if I can find the commit that breaks 6.5.
>>>>> I've traced this to the following merge commit:
>>>>>
>>>>> dave@atlas:~/linux/linux$ git bisect good
>>>>> ca7ce08d6a063e0ccb91dc57f9bc213120d0d1a7 is the first bad commit
>>>>> commit ca7ce08d6a063e0ccb91dc57f9bc213120d0d1a7
>>>>> Merge: 1546cd4bfda4 af92c02fb209
>>>>> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>>>> Date:   Fri Jun 30 11:57:07 2023 -0700
>>>>>
>>>>>     Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>>>>
>>>>>     Pull SCSI updates from James Bottomley:
>>>>>      "Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
>>>>>       lpfc, qla2xxx).
>>>>>
>>>>>       We have a couple of major core changes impacting other systems:
>>>>>
>>>>>        - Command Duration Limits, which spills into block and ATA
>>>>>
>>>>>        - block level Persistent Reservation Operations, which touches block,
>>>>>          nvme, target and dm
>>>>>
>>>>>       Both of these are added with merge commits containing a cover letter
>>>>>       explaining what's going on"
>>>>>
>>>>>     * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
>>>>>       scsi: core: Improve warning message in scsi_device_block()
>>>>>       scsi: core: Replace scsi_target_block() with scsi_block_targets()
>>>>>       scsi: core: Don't wait for quiesce in scsi_device_block()
>>>>>       scsi: core: Don't wait for quiesce in scsi_stop_queue()
>>>>>       scsi: core: Merge scsi_internal_device_block() and device_block()
>>>>>       scsi: sg: Increase number of devices
>>>>>       scsi: bsg: Increase number of devices
>>>>>       scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
>>>>>       scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
>>>>>       scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
>>>>>       scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
>>>>>       scsi: ufs: ufs-qcom: Switch to the new ICE API
>>>>>       scsi: ufs: dt-bindings: qcom: Add ICE phandle
>>>>>       scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
>>>>>       scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
>>>>>       scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
>>>>>       scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
>>>>>       scsi: ufs: core: Remove dedicated hwq for dev command
>>>>>       scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
>>>>>       scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
>>>>>       ...
>>>>>
>>>>> dave@atlas:~/linux/linux$ lspci
>>>>> 00:01.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02)
>>>>> 40:01.0 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>>>>> 40:01.1 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>>>>> 60:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 41)
>>>>> 60:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 41)
>>>>> 60:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 02)
>>>>> 60:02.0 IDE interface: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
>>>>> 60:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
>>>> This was introduced by the following commit:
>>>>
>>>> dave@atlas:~/linux/linux$ git bisect good
>>>> 624885209f31eb9985bf51abe204ecbffe2fdeea is the first bad commit
>>>> commit 624885209f31eb9985bf51abe204ecbffe2fdeea
>>>> Author: Damien Le Moal <dlemoal@xxxxxxxxxx>
>>>> Date:   Thu May 11 03:13:41 2023 +0200
>>>>
>>>>     scsi: core: Detect support for command duration limits
>>>>
>>>>     Introduce the function scsi_cdl_check() to detect if a device supports
>>>>     command duration limits (CDL). Support for the READ 16, WRITE 16, READ 32
>>>>     and WRITE 32 commands are checked using the function scsi_report_opcode()
>>>>     to probe the rwcdlp and cdlp bits as they indicate the mode page defining
>>>>     the command duration limits descriptors that apply to the command being
>>>>     tested.
>>>>
>>>>     If any of these commands support CDL, the field cdl_supported of struct
>>>>     scsi_device is set to 1 to indicate that the device supports CDL.
>>>>
>>>>     Support for CDL for a device is advertizes through sysfs using the new
>>>>     cdl_supported device attribute. This attribute value is 1 for a device
>>>>     supporting CDL and 0 otherwise.
>>>>
>>>>     Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
>>>>     Reviewed-by: Hannes Reinecke <hare@xxxxxxx>
>>>>     Co-developed-by: Niklas Cassel <niklas.cassel@xxxxxxx>
>>>>     Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
>>>>     Link: https://lore.kernel.org/r/20230511011356.227789-9-nks@xxxxxxxxxxx
>>>>     Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
>>>>
>>>>  Documentation/ABI/testing/sysfs-block-device |  9 ++++
>>>>  drivers/scsi/scsi.c                          | 81 ++++++++++++++++++++++++++++
>>>>  drivers/scsi/scsi_scan.c                     |  3 ++
>>>>  drivers/scsi/scsi_sysfs.c                    |  2 +
>>>>  include/scsi/scsi_device.h                   |  3 ++
>>>>  5 files changed, 98 insertions(+)
>>>>
>>>> Sometimes I see when booting a bad commit:
>>>> [...]
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> done.
>>>> Gave up waiting for root file system device.  Common problems:
>>>>  - Boot args (cat /proc/cmdline)
>>>>    - Check rootdelay= (did the system wait long enough?)
>>>>  - Missing modules (cat /proc/modules; ls /dev)
>>>> ALERT!  LABEL=ROOT does not exist.  Dropping to a shell!
>>>> Rebooting automatically due to panic= boot argument
>>>> ata4: SATA link down (SStatus 0 SControl 0)
>>>> ata5: SATA link down (SStatus 0 SControl 0)
>>>> ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>>>> ata6.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
>>>> ata6.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>>> ata6.00: configured for UDMA/100
>>>> scsi 5:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
>>> System boots master at e56b2b605799 if I disable CDL:
>>>
>>> dave@atlas:~/linux/linux$ git diff drivers/scsi/scsi.c
>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>> index d0911bc28663..dc3a283ebd75 100644
>>> --- a/drivers/scsi/scsi.c
>>> +++ b/drivers/scsi/scsi.c
>>> @@ -578,6 +578,8 @@ static bool scsi_cdl_check_cmd(struct scsi_device *sdev, u8 opcode, u16 sa,
>>>         int ret;
>>>         u8 cdlp;
>>>
>>> +       return false;
>>> +
>>>         /* Check operation code */
>>>         ret = scsi_report_opcode(sdev, buf, SCSI_CDL_CHECK_BUF_LEN, opcode, sa);
>>>         if (ret <= 0)
>> It is weird that this solves anything... the MAINTENANCE_IN command issued by
>> scsi_report_opcode() ends up being emulated in libata with
>> ata_scsiop_maint_in(). There are no actual commands issued to the drive, so
>> nothing that could actually fail/cause issues. By the time this is issued, the
>> ATA drive is also fully probed...
>>
>> Or is the drive connected to the Broadcom HBA you have ? In that case, libata is
>> not used and the HBA FW SAT (scsi-ata-translation) is likely to blame.
> /boot, / and swap partitions reside on a ST373207LW drive connected to a Broadcom HBA.  A
> ST4000VN008-2DR1 drive is connected to the  Silicon Image, Inc. SiI 3124 PCI-X Serial
> ATA Controller.  It mounts on /home.  There's also a cdrom connected to the Silicon
> Image, Inc. PCI0680 Ultra ATA-133 Host Controller and another ST4000VN008-2DR1 drive
> connected to a Broadcom HBA.  There are two Broadcom HBAs.
>
> I think the issue is with the root ST373207LW drive.  The console output indicates that the
> ROOT drive doesn't exist when the boot fails.
>
> Your change only appeared to affect actual SCSI drives.  That's why I tried disabling CDL.
>>
>> Could you send a full dmesg output for a clean boot and for a failed one so that
>> I can compare ?
> I'll try to get this together tomorrow.

Please also tell me the scsi_level reported for that drive (cat
/sys/block/sdX/device/scsi_level and output of sg_inq /dev/sdX).

Thanks !

>
> Dave
>

--
Damien Le Moal
Western Digital Research