Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability

From: Jens Axboe
Date: Mon Aug 14 2017 - 11:14:13 EST


On 08/14/2017 09:06 AM, Minchan Kim wrote:
> On Mon, Aug 14, 2017 at 08:36:00AM -0600, Jens Axboe wrote:
>> On 08/14/2017 02:50 AM, Minchan Kim wrote:
>>> Hi Jens,
>>>
>>> On Fri, Aug 11, 2017 at 08:26:59AM -0600, Jens Axboe wrote:
>>>> On 08/11/2017 04:46 AM, Christoph Hellwig wrote:
>>>>> On Wed, Aug 09, 2017 at 08:06:24PM -0700, Dan Williams wrote:
>>>>>> I like it, but do you think we should switch to sbvec[<constant>] to
>>>>>> preclude pathological cases where nr_pages is large?
>>>>>
>>>>> Yes, please.
>>>>>
>>>>> Then I'd like to see that the on-stack bio even matters for
>>>>> mpage_readpage / mpage_writepage. Compared to all the buffer head
>>>>> overhead the bio allocation should not actually matter in practice.
>>>>
>>>> I'm skeptical for that path, too. I also wonder how far we could go
>>>> with just doing a per-cpu bio recycling facility, to reduce the cost
>>>> of having to allocate a bio. The on-stack bio parts are fine for
>>>> simple use case, where simple means that the patch just special
>>>> cases the allocation, and doesn't have to change much else.
>>>>
>>>> I had a patch for bio recycling and batched freeing a year or two
>>>> ago, I'll see if I can find and resurrect it.
>>>
>>> So, you want to go with per-cpu bio recycling approach to
>>> remove rw_page?
>>>
>>> So, do you want me to hold this patchset?
>>
>> I don't want to hold this series up, but I do think the recycling is
>> a cleaner approach since we don't need to special case anything. I
>> hope I'll get some time to dust it off, retest, and post soon.
>
> I don't know how your bio recycling works. But my worry when I heard
> per-cpu bio recycling firstly is if it's not reserved pool for
> BDI_CAP_SYNCHRONOUS(IOW, if it is shared by several storages),
> BIOs can be consumed by slow device(e.g., eMMC) so that a bio for
> fastest device(e.g., zram in embedded system) in the system can be
> stucked to wait on bio until IO for slow deivce is completed.
>
> I guess it would be a not rare case for swap device under severe
> memory pressure because lots of page cache are already reclaimed when
> anonymous page start to be reclaimed so that many BIOs can be consumed
> for eMMC to fetch code but swap IO to fetch heap data would be stucked
> although zram-swap is much faster than eMMC.
> As well, time to wait to get BIO among even fastest devices is
> simple waste, I guess.

I don't think that's a valid concern. First of all, for the recycling,
it's not like you get to wait on someone else using a recycled bio,
if it's not there you simply go to the regular bio allocator. There
is no waiting for free. The idea is to have allocation be faster since
we can avoid going to the memory allocator for most cases, and speed
up freeing as well, since we can do that in batches too.

Secondly, generally you don't have slow devices and fast devices
intermingled when running workloads. That's the rare case.

> To me, bio suggested by Christoph Hellwig isn't diverge current
> path a lot and simple enough to change.

It doesn't diverge it a lot, but it does split it up.

> Anyway, I'm okay with either way if we can remove rw_page without
> any regression because the maintainance of both rw_page and
> make_request is rather burden for zram, too.

Agree, the ultimate goal of both is to eliminate the need for the
rw_page hack.

--
Jens Axboe