Re: [regression] Bug 217074 - upgrading to kernel 6.1.12 from 5.15.x can no longer assemble software raid0
From: Song Liu
Date: Fri Mar 03 2023 - 17:43:31 EST
+ Jes.
It appeared to me that we can assemble the array if we have any of the
following:
1. Enable CONFIG_BLOCK_LEGACY_AUTOLOAD;
2. Have a valid /etc/mdadm.conf;
3. Update mdadm to handle this case. (I tried some ugly hacks, which worked but
weren't clean).
Since we eventually would like to get rid of CONFIG_BLOCK_LEGACY_AUTOLOAD, I
think we need mdadm to handle this properly. But the logistics might
be complicated, as
mdadm are shipped separately.
Jes, what do you think about this? AFAICT, we need to update the logic in
mdopen.c:create_mddev().
Thanks,
Song
On Thu, Feb 23, 2023 at 8:06 AM Linux regression tracking (Thorsten
Leemhuis) <regressions@xxxxxxxxxxxxx> wrote:
>
> Hi, this is your Linux kernel regression tracker.
>
> I noticed a regression report in bugzilla.kernel.org. As many (most?)
> kernel developer don't keep an eye on it, I decided to forward it by
> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=217074 :
>
> > Hello,
> > Installing a new kernel 6.1.12 does not allow assembly of raid0 device.
> >
> > Going back to previous working kernels: 5.15.65, 5.15.75 assembles the raid0 without any problems.
> >
> > Kernel command line parameters:
> > ... ro kvm_amd.nested=0 kvm_amd.avic=1 kvm_amd.npt=1 raid0.default_layout=2
> >
> > mdadm assembly attempt fails with:
> > 'mdadm: unexpected failure opening /dev/md<NR>'
> >
> > Tried with mdadm-4.1 and mdadm-4.2, but as it works with either versions of mdadm, I rule out the mdadm software.
> >
> > strace -f output, last few lines:
> >
> > mkdir("/run/mdadm", 0755) = -1 EEXIST (File exists)
> > openat(AT_FDCWD, "/run/mdadm/map.lock", O_RDWR|O_CREAT|O_TRUNC, 0600) = 3
> > fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
> > flock(3, LOCK_EX) = 0
> > newfstatat(3, "", {st_mode=S_IFREG|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
> > openat(AT_FDCWD, "/run/mdadm/map", O_RDONLY) = 4
> > fcntl(4, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
> > newfstatat(4, "", {st_mode=S_IFREG|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
> > read(4, "", 4096) = 0
> > close(4) = 0
> > openat(AT_FDCWD, "/run/mdadm/map", O_RDONLY) = 4
> > fcntl(4, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
> > newfstatat(4, "", {st_mode=S_IFREG|0600, st_size=0, ...}, AT_EMPTY_PATH) = 0
> > read(4, "", 4096) = 0
> > close(4) = 0
> > newfstatat(AT_FDCWD, "/dev/.udev", 0x7ffcd8243c90, 0) = -1 ENOENT (No such file or directory)
> > newfstatat(AT_FDCWD, "/run/udev", {st_mode=S_IFDIR|0755, st_size=160, ...}, 0) = 0
> > openat(AT_FDCWD, "/proc/mdstat", O_RDONLY) = 4
> > fcntl(4, F_SETFD, FD_CLOEXEC) = 0
> > newfstatat(4, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
> > read(4, "Personalities : [raid1] [raid0] "..., 1024) = 56
> > read(4, "", 1024) = 0
> > close(4) = 0
> > openat(AT_FDCWD, "/sys/block/md127/dev", O_RDONLY) = -1 ENOENT (No such file or directory)
> > getpid() = 18351
> > mknodat(AT_FDCWD, "/dev/.tmp.md.18351:9:127", S_IFBLK|0600, makedev(0x9, 0x7f)) = 0
> > openat(AT_FDCWD, "/dev/.tmp.md.18351:9:127", O_RDWR|O_EXCL|O_DIRECT) = -1 ENXIO (No such device or address)
> > unlink("/dev/.tmp.md.18351:9:127") = 0
> > getpid() = 18351
> > mknodat(AT_FDCWD, "/tmp/.tmp.md.18351:9:127", S_IFBLK|0600, makedev(0x9, 0x7f)) = 0
> > openat(AT_FDCWD, "/tmp/.tmp.md.18351:9:127", O_RDWR|O_EXCL|O_DIRECT) = -1 ENXIO (No such device or address)
> > unlink("/tmp/.tmp.md.18351:9:127") = 0
> > write(2, "mdadm: unexpected failure openin"..., 45mdadm: unexpected failure opening /dev/md127
> > ) = 45
> > unlink("/run/mdadm/map.lock") = 0
> > close(3) = 0
> > exit_group(1) = ?
> > +++ exited with 1 +++
> >
> >
> > Tried with kernel compiled with either CONFIG_DEVTMPFS_SAFE=y or CONFIG_DEVTMPFS_SAFE=n, fails the same way.
> >
> > The raid consists of 4 devices, here is mdstat contents:
> >
> > Personalities : [raid0]
> > md127 : active raid0 sda[0] sdc[2] sdd[3] sdb[1]
> > 2929769472 blocks super 1.2 512k chunks
> >
> > unused devices: <none>
> >
> >
> > Examining the 4 block devices:
> >
> > gnusystem /var/log # mdadm --misc -E /dev/sda
> > /dev/sda:
> > Magic : a92b4efc
> > Version : 1.2
> > Feature Map : 0x0
> > Array UUID : bb710ce6:edd5d68d:a0a0a405:edd99547
> > Name : gnusystem:md0-store (local to host gnusystem)
> > Creation Time : Wed Sep 29 22:28:09 2021
> > Raid Level : raid0
> > Raid Devices : 4
> >
> > Avail Dev Size : 976508976 sectors (465.64 GiB 499.97 GB)
> > Data Offset : 264192 sectors
> > Super Offset : 8 sectors
> > Unused Space : before=264112 sectors, after=0 sectors
> > State : clean
> > Device UUID : 7f226c1c:23632b9d:e3d6c656:74522906
> >
> > Update Time : Wed Sep 29 22:28:09 2021
> > Bad Block Log : 512 entries available at offset 8 sectors
> > Checksum : 51e99fb5 - correct
> > Events : 0
> >
> > Chunk Size : 512K
> >
> > Device Role : Active device 0
> > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
> > gnusystem /var/log # mdadm --misc -E /dev/sdb
> > /dev/sdb:
> > Magic : a92b4efc
> > Version : 1.2
> > Feature Map : 0x0
> > Array UUID : bb710ce6:edd5d68d:a0a0a405:edd99547
> > Name : gnusystem:md0-store (local to host gnusystem)
> > Creation Time : Wed Sep 29 22:28:09 2021
> > Raid Level : raid0
> > Raid Devices : 4
> >
> > Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
> > Data Offset : 264192 sectors
> > Super Offset : 8 sectors
> > Unused Space : before=264112 sectors, after=0 sectors
> > State : clean
> > Device UUID : ed8795fe:c7e6719a:165db37e:32ec0894
> >
> > Update Time : Wed Sep 29 22:28:09 2021
> > Bad Block Log : 512 entries available at offset 8 sectors
> > Checksum : 215db63b - correct
> > Events : 0
> >
> > Chunk Size : 512K
> >
> > Device Role : Active device 1
> > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
> > gnusystem /var/log # mdadm --misc -E /dev/sdc
> > /dev/sdc:
> > Magic : a92b4efc
> > Version : 1.2
> > Feature Map : 0x0
> > Array UUID : bb710ce6:edd5d68d:a0a0a405:edd99547
> > Name : gnusystem:md0-store (local to host gnusystem)
> > Creation Time : Wed Sep 29 22:28:09 2021
> > Raid Level : raid0
> > Raid Devices : 4
> >
> > Avail Dev Size : 976508976 sectors (465.64 GiB 499.97 GB)
> > Data Offset : 264192 sectors
> > Super Offset : 8 sectors
> > Unused Space : before=264112 sectors, after=0 sectors
> > State : clean
> > Device UUID : 3713dfff:d2e29aaf:3275039d:08b317bb
> >
> > Update Time : Wed Sep 29 22:28:09 2021
> > Bad Block Log : 512 entries available at offset 8 sectors
> > Checksum : 42f70f03 - correct
> > Events : 0
> >
> > Chunk Size : 512K
> >
> > Device Role : Active device 2
> > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
> > gnusystem /var/log # mdadm --misc -E /dev/sdd
> > /dev/sdd:
> > Magic : a92b4efc
> > Version : 1.2
> > Feature Map : 0x0
> > Array UUID : bb710ce6:edd5d68d:a0a0a405:edd99547
> > Name : gnusystem:md0-store (local to host gnusystem)
> > Creation Time : Wed Sep 29 22:28:09 2021
> > Raid Level : raid0
> > Raid Devices : 4
> >
> > Avail Dev Size : 1953260976 sectors (931.39 GiB 1000.07 GB)
> > Data Offset : 264192 sectors
> > Super Offset : 8 sectors
> > Unused Space : before=264112 sectors, after=0 sectors
> > State : clean
> > Device UUID : 7da858ae:c0d6ca51:0ecaaaf0:280367cc
> >
> > Update Time : Wed Sep 29 22:28:09 2021
> > Bad Block Log : 512 entries available at offset 8 sectors
> > Checksum : 32cf4ab4 - correct
> > Events : 0
> >
> > Chunk Size : 512K
> >
> > Device Role : Active device 3
> > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
> >
> > If any more information is needed, let me know.
>
> See the ticket for details.
>
>
> [TLDR for the rest of this mail: I'm adding this report to the list of
> tracked Linux kernel regressions; the text you find below is based on a
> few templates paragraphs you might have encountered already in similar
> form.]
>
> BTW, let me use this mail to also add the report to the list of tracked
> regressions to ensure it's doesn't fall through the cracks:
>
> #regzbot introduced: v5.15..v6.1.12
> https://bugzilla.kernel.org/show_bug.cgi?id=217074
> #regzbot title: block: md: raid0 no longer assembled
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
>
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (e.g. the buzgzilla ticket and maybe this mail as well, if
> this thread sees some discussion). See page linked in footer for details.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.