Re: [PATCH] autofs: don't fail mount for transient error

From: Ian Kent
Date: Fri Nov 03 2017 - 08:45:14 EST


On 03/11/17 09:40, NeilBrown wrote:
>

Hi Neil, and thanks taking the time to post the patch.

> Currently if the autofs kernel module gets an error when
> writing to the pipe which links to the daemon, then it
> marks the whole moutpoint as catatonic, and it will stop working.
>
> It is possible that the error is transient. This can happen
> if the daemon is slow and more than 16 requests queue up.
> If a subsequent process tries to queue a request, and is then signalled,
> the write to the pipe will return -ERESTARTSYS and autofs
> will take that as total failure.

Indeed it does.

And given the problems with a half dozen (or so) user space
applications consuming large amounts of CPU under heavy mount
and umount activity this could happen more easily than we
expect.

>
> So change the code to assess -ERESTARTSYS and -ENOMEM as transient
> failures which only abort the current request, not the whole
> mountpoint.

This looks good to me.

>
> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
> ---
>
> Do people think this should got to -stable ??
> It isn't a crash or a data corruption, but having autofs mountpoints
> suddenly stop working is rather inconvenient.

Perhaps that's a good idea given the CPU usage problem I refer
to above has been around for a while now.

>
> Thanks,
> NeilBrown
>
>
> fs/autofs4/waitq.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index 4ac49d038bf3..8fc41705c7cd 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_sb_info *sbi,
> spin_unlock_irqrestore(&current->sighand->siglock, flags);
> }
>
> - return (bytes > 0);
> + /* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
> + return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
> }
>
> static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> @@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> } pkt;
> struct file *pipe = NULL;
> size_t pktsz;
> + int ret;
>
> pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
> (unsigned long) wq->wait_queue_token,
> @@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> mutex_unlock(&sbi->wq_mutex);
>
> if (autofs4_write(sbi, pipe, &pkt, pktsz))
> + switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
> + case 0:
> + break;
> + case -ENOMEM:
> + case -ERESTARTSYS:
> + /* Just fail this one */
> + autofs4_wait_release(sbi, wq->wait_queue_token, ret);
> + break;
> + default:
> autofs4_catatonic_mode(sbi);
> + break;
> + }
> fput(pipe);
> }
>
>