Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.
Shreeya Patel, Masahiro Yamada: what's the status of this? Was any
progress made to address this? Or is this maybe (accidentally?) fixed
with 6.5-rc1?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
#regzbot poke
On 20.06.23 06:19, Masahiro Yamada wrote:
On Mon, Jun 12, 2023 at 7:10 PM Shreeya Patel
<shreeya.patel@xxxxxxxxxxxxx> wrote:
On 24/05/23 02:57, Nick Desaulniers wrote:
On Tue, May 23, 2023 at 3:27 AM Shreeya PatelI'm not sure if you followed the conversation but we are still seeing
<shreeya.patel@xxxxxxxxxxxxx> wrote:
Hi Nick and Masahiro,Thanks!
On 23/05/23 01:22, Nick Desaulniers wrote:
On Mon, May 22, 2023 at 9:52 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:Here are the links to Lava jobs ran with initcall_debug added to the
On Mon, May 22, 2023 at 12:09:34PM +0200, Ricardo Cañuelo wrote:Ah, right it was the initcall ordering. Thanks for the reminder.
On vie, may 19 2023 at 08:57:24, Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:Yes, it matters, you can not change it. If you do, systems will break.
It could be; if the link order was changed, it's possible that thisI thought that was specifically a C++ problem? But then again, the
target may be hitting something along the lines of:
https://isocpp.org/wiki/faq/ctors#static-init-order i.e. the "static
initialization order fiasco"
I'm struggling to think of how this appears in C codebases, but I
swear years ago I had a discussion with GKH (maybe?) about this. I
think I was playing with converting Kbuild to use Ninja rather than
Make; the resulting kernel image wouldn't boot because I had modified
the order the object files were linked in. If you were to randomly
shuffle the object files in the kernel, I recall some hazard that may
prevent boot.
kernel docs explicitly say that the ordering of obj-y goals in kbuild is
significant in some instances [1]:
It is the only way we have of properly ordering our init calls within
the same "level".
(There's a joke in there similar to the use of regexes to solve a
problem resulting in two new problems; initcalls have levels for
ordering, but we still have (unexpressed) dependencies between calls
of the same level; brittle!).
+Maksim, since that might be relevant info for the BOLT+Kernel work.
Ricardo,
https://elinux.org/images/e/e8/2020_ELCE_initcalls_myjosserand.pdf
mentions that there's a kernel command line param `initcall_debug`.
Perhaps that can be used to see if
5750121ae7382ebac8d47ce6d68012d6cd1d7926 somehow changed initcall
ordering, resulting in a config that cannot boot?
kernel command line.
1. Where regression happens (5750121ae7382ebac8d47ce6d68012d6cd1d7926)
https://lava.collabora.dev/scheduler/job/10417706
<https://lava.collabora.dev/scheduler/job/10417706>
2. With a revert of the commit 5750121ae7382ebac8d47ce6d68012d6cd1d7926
https://lava.collabora.dev/scheduler/job/10418012
<https://lava.collabora.dev/scheduler/job/10418012>
Yeah, I can see a diff in the initcall ordering as a result of
commit 5750121ae738 ("kbuild: list sub-directories in ./Kbuild")
https://gist.github.com/nickdesaulniers/c09db256e42ad06b90842a4bb85cc0f4
Not just different orderings, but some initcalls seem unique to the
before vs. after, which is troubling. (example init_events and
init_fs_sysctls respectively)
That isn't conclusive evidence that changes to initcall ordering are
to blame, but I suspect confirming that precisely to be very very time
consuming.
Masahiro, what are your thoughts on reverting 5750121ae738? There are
conflicts in Kbuild and Makefile when reverting 5750121ae738 on
mainline.
this regression with the latest kernel builds and would like to know if
you plan to revert 5750121ae738?
Reverting 5750121ae738 does not solve the issue
because the issue happens even before 5750121ae738.
multi_v7_defconfig + debug.config + CONFIG_MODULES=n
fails to boot in the same way.
The revert would hide the issue on a particular build setup.
I submitted a patch to more pin-point the issue.
Let's see how it goes.
https://lore.kernel.org/lkml/ZJEni98knMMkU%2Fcl@xxxxxxxxxxxxxxxxxx/T/#t
(BTW, the initcall order is unrelated)
--
Thanks,
Shreeya Patel
Thanks,
Shreeya Patel
Best Regards
Masahiro Yamada