Intermittent inability to type in graphical Plymouth on UEFI VMs since kernel 6.9
From: Adam Williamson
Date: Fri May 24 2024 - 12:08:40 EST
Hi, folks. Please CC me on replies, I'm not subscribed to the list. The
downstream bug report for this is
https://bugzilla.redhat.com/show_bug.cgi?id=2274770 .
I maintain Fedora's openQA instance - https://openqa.fedoraproject.org/
(openQA is an automated testing system which runs jobs on qemu VMs,
inputting keyboard and mouse events via VNC, and monitoring results via
screenshots and the serial console).
We have several tests that involve doing an install of Fedora with root
storage encrypted and then booting it. Some of these install enough
packages for us to hit the 'graphical' mode of plymouth (the bootsplash
manager thingy), so we see a graphical passphrase prompt like
https://openqa.fedoraproject.org/tests/2642868#step/_graphical_wait_login/3
; some are minimal installs, so we see a text prompt like
https://openqa.fedoraproject.org/tests/2642845#step/disk_guided_encrypted_postinstall/1
.
Recently I switched up our configuration so most of these tests run on
UEFI VMs (previously they mostly ran on BIOS VMs). When I did that, the
tests that hit the graphical prompt started failing frequently on Fedora
Rawhide. The tests that hit the text prompt do not seem to be affected.
At first I figured this was caused by a plymouth change, but some
testing indicates it's actually related to kernel version: it seems to
have been introduced in kernel 6.9. Fedora 40 uses kernel 6.8, so tests
on F40 are not usually affected by this, but I engineered some runs of
an affected test on an F40 install with kernel 6.9, and they hit the bug.
So to summarize, we hit the bug when all the following conditions are met:
* Running on UEFI qemu-kvm VM
* Graphical passphrase prompt encountered on boot
* Running kernel 6.9
When it sees the passphrase prompt, the test system types the correct
password. When the bug happens, this input seems to simply be ignored -
plymouth does not echo dots back to the screen representing the typed
characters, and on hitting enter the system does not attempt to proceed
with decryption. (Unfortunately this also means we don't get any logs
from the failure, as the test system needs a booted system to be able to
upload any logs).
Looking at results from the last month and a half, the bug happens on
about 30% of the tests run.
I have reproduced this manually in a similar VM, but did not yet manage
to reproduce it on hardware (which is unfortunate, as it'd make it
somewhat easier to attempt some kind of bisect).
The earliest build I can say for sure the bug happened with is
kernel-6.9.0-0.rc0.20240322git8e938e398669.14.fc41 .
--
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @adamw@xxxxxxxxxxxxx
https://www.happyassassin.net