Re: Support for I/O to a bitbucket

From: Dave Chinner
Date: Sun Sep 06 2020 - 20:56:57 EST

Next message: Masami Hiramatsu: "Re: [PATCH v2 08/11] kprobes: switch to kernel_clone()"
Previous message: Linus Torvalds: "Linux 5.9-rc4"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Aug 18, 2020 at 06:22:31PM +0100, Matthew Wilcox wrote:
> One of the annoying things in the iomap code is how we handle
> block-misaligned I/Os. Consider a write to a file on a 4KiB block size
> filesystem (on a 4KiB page size kernel) which starts at byte offset 5000
> and is 4133 bytes long.
>
> Today, we allocate page 1 and read bytes 4096-8191 of the file into
> it, synchronously. Then we allocate page 2 and read bytes 8192-12287
> into it, again, synchronously. Then we copy the user's data into the
> pagecache and mark it dirty. This is a fairly significant delay for
> the user who normally sees the latency of a memcpy() now has to wait
> for two non-overlapping reads to complete.
>
> What I'd love to be able to do is allocate pages 1 & 2, copy the user
> data into it and submit one read which targets:
>
> 0-903: page 1, offset 0, length 904
> 904-5036: bitbucket, length 4133
> 5037-8191: page 2, offset 942, length 3155
>
> That way, we don't even need to wait for the read to complete.

I'm not sure that offloading the page cache's job of isolating
unaligned IO from the block layer to the block layer is the write
way to do this.

Essentially you are moving the RMW down in the block layer where it
will have to allocate memory to do IO on sector based boundaries so
it doesn't trash the data you've already copied into the pages in
the bio.

Either way, you need a secondary buffer to do this - one for the
read IO to DMA into with sector alignment, the other to contain the
user data that is sungle byte aligned.

This seems to me like it could be done entirely at the iomap level
just by linking the async read IO buffer back to the page cache page
and holding the "data to copy in" state in a struct attached to the
async IO buffer's page->private. It adds a little complexity to the
read IO completion (i.e. iomap_read_finish()), but it's no worse
than anything we do with write IO completions...

And if the two pages are adjacent like the above, it could be done
with a single async reads, or even two separate async reads that
get merged into one IO at the block layer via plugging...

> Anyway, I don't have time to take on this work, but I thought I'd throw
> it out in case anyone's looking for a project. Or if it's a stupid idea,
> someone can point out why.

I think it's pretty straight forward to do it in the iomap layer...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

Next message: Masami Hiramatsu: "Re: [PATCH v2 08/11] kprobes: switch to kernel_clone()"
Previous message: Linus Torvalds: "Linux 5.9-rc4"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]