Re: [bug report] dead loop in generic_perform_write() //Re: [PATCH v7 07/12] iov_iter: Convert iterate*() to inline funcs

From: Linus Torvalds
Date: Wed Feb 28 2024 - 16:21:44 EST


On Sat, 17 Feb 2024 at 19:13, Tong Tiangen <tongtiangen@xxxxxxxxxx> wrote:
>
> After this patch:
> copy_page_from_iter_atomic()
> -> iterate_and_advance2()
> -> iterate_bvec()
> -> remain = step()
>
> With CONFIG_ARCH_HAS_COPY_MC, the step() is copy_mc_to_kernel() which
> return "bytes not copied".
>
> When a memory error occurs during step(), the value of "left" equal to
> the value of "part" (no one byte is copied successfully). In this case,
> iterate_bvec() returns 0, and copy_page_from_iter_atomic() also returns
> 0. The callback shmem_write_end()[2] also returns 0. Finally,
> generic_perform_write() goes to "goto again"[3], and the loop restarts.
> 4][5] cannot enter and exit the loop, then deadloop occurs.

Hmm. If the copy doesn't succeed and make any progress at all, then
the code in generic_perform_write() after the "goto again"

//[4]
if (unlikely(fault_in_iov_iter_readable(i, bytes) ==
bytes)) {
status = -EFAULT;
break;
}

should break out of the loop.

So either your analysis looks a bit flawed, or I'm missing something.
Likely I'm missing something really obvious.

Why does the copy_mc_to_kernel() fail, but the
fault_in_iov_iter_readable() succeeds?

Linus