Re: Read starvation by sync writes

From: Jeff Moyer
Date: Wed Dec 12 2012 - 14:41:57 EST


Jeff Moyer <jmoyer@xxxxxxxxxx> writes:

>> I agree. This isn't about scheduling, we haven't even reached that part
>> yet. Back when we split the queues into read vs write, this problem
>> obviously wasn't there. Now we have sync writes and reads, both eating
>> from the same pool. The io scheduler can impact this a bit by forcing
>> reads to must allocate (Jan, which io scheduler are you using?). CFQ
>> does this when it's expecting a request from this process queue.
>>
>> Back in the day, we used to have one list. To avoid a similar problem,
>> we reserved the top of the list for reads. With the batching, it's a bit
>> more complicated. If we make the request allocation (just that, not the
>> scheduling) be read vs write instead of sync vs async, then we have the
>> same issue for sync vs buffered writes.
>>
>> How about something like the below? Due to the nature of sync reads, we
>> should allow a much longer timeout. The batch is really tailored towards
>> writes at the moment. Also shrink the batch count, 32 is pretty large...
>
> Does batching even make sense for dependent reads? I don't think it
> does.

Having just read the batching code in detail, I'd like to ammend this
misguided comment. Batching logic kicks in when you happen to be lucky
enough to use up the last request. As such, I'd be surprised if the
patch you posted helped. Jens, don't you think the writer is way more
likely to become the batcher? I do agree with shrinking the batch count
to 16, whether or not the rest of the patch goes in.

> Assuming you disagree, then you'll have to justify that fixed
> time value of 2 seconds. The amount of time between dependent reads
> will vary depending on other I/O sent to the device, the properties of
> the device, the I/O scheduler, and so on. If you do stick 2 seconds in
> there, please comment it. Maybe it's time we started keeping track of
> worst case Q->C time? That could be used to tell worst case latency,
> and adjust magic timeouts like this one.
>
> I'm still thinking about how we might solve this in a cleaner way.

The way things stand today, you can do a complete end run around the I/O
scheduler by queueing up enough I/O. To address that, I think we need
to move to a request list per io_context as Jan had suggested. That
way, we can keep the logic about who gets to submit I/O when in one
place.

Jens, what do you think?

Jan, for now, try bumping nr_requests up really high. ;-)

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/