Re: autofs: add autofs_parse_fd()

From: Ian Kent
Date: Mon Oct 23 2023 - 18:24:36 EST


On 23/10/23 21:57, Anders Roxell wrote:
On Mon, 23 Oct 2023 at 09:35, Ian Kent <raven@xxxxxxxxxx> wrote:
On 23/10/23 08:48, Ian Kent wrote:
On 20/10/23 21:09, Ian Kent wrote:
On 20/10/23 19:23, Arnd Bergmann wrote:
On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <arnd@xxxxxxxx> wrote:
On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit
rootfs we call
it as compat mode boot testing. Recently it started to failed to
get login
prompt.

We have not seen any kernel crash logs.

Anders, bisection is pointing to first bad commit,
546694b8f658 autofs: add autofs_parse_fd()

Reported-by: Linux Kernel Functional Testing <lkft@xxxxxxxxxx>
Reported-by: Anders Roxell <anders.roxell@xxxxxxxxxx>
I tried to find something in that commit that would be different
in compat mode, but don't see anything at all -- this appears
to be just a simple refactoring of the code, unlike the commits
that immediately follow it and that do change the mount
interface.

Unfortunately this makes it impossible to just revert the commit
on top of linux-next. Can you double-check your bisection by
testing 546694b8f658 and the commit before it again?
I tried these two patches again:
546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots

One difference that I notice between those two patches is that we no
long call autofs_prepare_pipe(). We just call autofs_check_pipe().
Indeed, so some of the f_flags end up being different. I assumed
this was done intentionally, but it might be worth checking if
the patch below makes any difference when the flags get put
back the way they were. This is probably not the correct fix, but
may help figure out what is going on. It should apply to anything
from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current
linux-next:

--- a/fs/autofs/inode.c
+++ b/fs/autofs/inode.c
@@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block
*s, struct fs_context *fc)
pr_debug("pipe fd = %d, pgrp = %u\n",
sbi->pipefd, pid_nr(sbi->oz_pgrp));
+ /* We want a packet pipe */
+ sbi->pipe->f_flags |= O_DIRECT;
+ /* We don't expect -EAGAIN */
+ sbi->pipe->f_flags &= ~O_NONBLOCK;
+

That makes sense, we do want a packet pipe and that does also mean

we don't want a non-blocking pipe, it will be interesting to see

if that makes a difference. It's been a long time since Linus

implemented that packet pipe and I can't remember now what the

case was that lead to it.
After thinking about this over the weekend I'm pretty sure my mistake

is dropping the call to autofs_prepare_pipe() without adding the tail

end of it into autofs_parse_fd().


To explain a bit of history which I'll include in the fix description.

During autofs v5 development I decided to stay with the existing usage

instead of changing to a packed structure for autofs <=> user space

communications which turned out to be a mistake on my part.


Problems arose and they were fixed by allowing for the 64 bit to 32 bit

size difference in the automount(8) code.


Along the way systemd started to use autofs and eventually encountered

this problem too. systemd refused to compensate for the length difference

insisting it be fixed in the kernel. Fortunately Linus implemented the

packetized pipe which resolved the problem in a straight forward and

simple way.


So I pretty sure that the cause of the problem is the inadvertent
dropping

of the flags setting in autofs_fill_super() that Arnd spotted although I

don't think putting it in autofs_fill_super() is the right thing to do.


I'll produce a patch today which includes most of this explanation for

future travelers ...
So I have a patch.


I'm of two minds whether to try and use the instructions to reproduce this

or not because of experiences I have had with other similar testing
automation

systems that claim to provide a reproducer and end up a huge waste of
time and

are significantly frustrating.


Can someone please perform a test for me once I provide the patch?
Just tested it, and it passed our tests. Added a tested by flag in your patch.

Thanks for the prompt fix.

That's great to hear Anders, thanks for doing the testing, ;)


Ian