Re: read()/readv() only from page cache

From: Milosz Tanski
Date: Fri Sep 05 2014 - 12:45:59 EST


On Fri, Sep 5, 2014 at 12:32 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> On Fri, Sep 05, 2014 at 12:27:21PM -0400, Milosz Tanski wrote:
>> I would prefer a interface more like recv() where I can specify the
>> flag if I want blocking behavior for this read or not. Let me explain
>> why:
>>
>> In a VLDB like workload this would enable me to lower the latency of
>> common fast requests and. By fast requests I mean ones that do not
>> require much data, the data is cached, or there's a predictable read
>> pattern (read-ahead). Obviously it would be at the expense of the
>> latency of large/slow requests (they have to make 2 read calls, the
>> first one always EWOULDBLOCK) ... but in that case it doesn't matter
>> since the time to do actual IO would trump any kind of extra latency.
>
> This is another good suggestion. I've actually heard people asking
> for allowing per-I/O flags for other uses cases. The one I cane
> remember is applying O_DSYNC only for FUA writes on a SCSI target,
> the other one would be Samba again, as SMB allows per-I/O flags on
> the wire as well.
>
>> Essentially, it's using the kernel facilities (page cache) to help me
>> perform better (in a more predictable fashion). I would implement this
>> in our application tomorrow. It's frustrating that there is a similar
>> interface (recv* family) that I cannot use.
>>
>> I know there's been a bunch of attempts at buffered AIO and none of
>> them made it into the kernel. It would let me build a buffered AIO
>> implementation in user-space using a threadpool. And cached data would
>> not end up getting blocked behind other non-cached requests sitting in
>> the queue. I know there's other sources of blocking (locking, metadata
>> lookups) but direct AIO already suffers from these so I'm fine to
>> paper over that for now.
>
> Although I still think providing useful AIO at the kernel level would be
> better than having everyone reimplement it it still would be useful to
> allow people to sanely reimplement it. If only to avoid the discussion
> about what API to use between the non-standard and not really that nice
> Linux io_submit and the utterly horrible Posix aio_ semantics.

Yeah, I would love for that to happen but I've been lurking and
following the non-blocking buffered AIO discussions and attempts on
lkml since about 2008 and the threads go back much further than that
about 12 years. I would take a much less ambitious syscall read/pread
syscall that gets me 90% of the way there and I can build the
remainder in user-space. It also has the nice side-effect of being
providing a not-horrible fallback for older/non-linux systems where
all IO goes into the thread pool (without the option to skip it).

--
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/