Re: [PATCH] virtiofs: limit the length of ITER_KVEC dio by max_nopage_rw

From: Michael S. Tsirkin
Date: Sun Feb 25 2024 - 03:46:35 EST


On Fri, Feb 23, 2024 at 10:42:37AM +0100, Miklos Szeredi wrote:
> On Wed, 3 Jan 2024 at 11:58, Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >
> > From: Hou Tao <houtao1@xxxxxxxxxx>
> >
> > When trying to insert a 10MB kernel module kept in a virtiofs with cache
> > disabled, the following warning was reported:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 2 PID: 439 at mm/page_alloc.c:4544 ......
> > Modules linked in:
> > CPU: 2 PID: 439 Comm: insmod Not tainted 6.7.0-rc7+ #33
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ......
> > RIP: 0010:__alloc_pages+0x2c4/0x360
> > ......
> > Call Trace:
> > <TASK>
> > ? __warn+0x8f/0x150
> > ? __alloc_pages+0x2c4/0x360
> > __kmalloc_large_node+0x86/0x160
> > __kmalloc+0xcd/0x140
> > virtio_fs_enqueue_req+0x240/0x6d0
> > virtio_fs_wake_pending_and_unlock+0x7f/0x190
> > queue_request_and_unlock+0x58/0x70
> > fuse_simple_request+0x18b/0x2e0
> > fuse_direct_io+0x58a/0x850
> > fuse_file_read_iter+0xdb/0x130
> > __kernel_read+0xf3/0x260
> > kernel_read+0x45/0x60
> > kernel_read_file+0x1ad/0x2b0
> > init_module_from_file+0x6a/0xe0
> > idempotent_init_module+0x179/0x230
> > __x64_sys_finit_module+0x5d/0xb0
> > do_syscall_64+0x36/0xb0
> > entry_SYSCALL_64_after_hwframe+0x6e/0x76
> > ......
> > </TASK>
> > ---[ end trace 0000000000000000 ]---
> >
> > The warning happened as follow. In copy_args_to_argbuf(), virtiofs uses
> > kmalloc-ed memory as bound buffer for fuse args, but
>
> So this seems to be the special case in fuse_get_user_pages() when the
> read/write requests get a piece of kernel memory.
>
> I don't really understand the comment in virtio_fs_enqueue_req(): /*
> Use a bounce buffer since stack args cannot be mapped */
>
> Stefan, can you explain? What's special about the arg being on the stack?

virtio core wants DMA'able addresses.

See Documentation/core-api/dma-api-howto.rst :

..


This rule also means that you may use neither kernel image addresses
(items in data/text/bss segments), nor module image addresses, nor
stack addresses for DMA.



> What if the arg is not on the stack (as is probably the case for big
> args like this)? Do we need the bounce buffer in that case?
>
> Thanks,
> Miklos