Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability

From: Minchan Kim
Date: Mon Aug 14 2017 - 11:06:40 EST


On Mon, Aug 14, 2017 at 08:36:00AM -0600, Jens Axboe wrote:
> On 08/14/2017 02:50 AM, Minchan Kim wrote:
> > Hi Jens,
> >
> > On Fri, Aug 11, 2017 at 08:26:59AM -0600, Jens Axboe wrote:
> >> On 08/11/2017 04:46 AM, Christoph Hellwig wrote:
> >>> On Wed, Aug 09, 2017 at 08:06:24PM -0700, Dan Williams wrote:
> >>>> I like it, but do you think we should switch to sbvec[<constant>] to
> >>>> preclude pathological cases where nr_pages is large?
> >>>
> >>> Yes, please.
> >>>
> >>> Then I'd like to see that the on-stack bio even matters for
> >>> mpage_readpage / mpage_writepage. Compared to all the buffer head
> >>> overhead the bio allocation should not actually matter in practice.
> >>
> >> I'm skeptical for that path, too. I also wonder how far we could go
> >> with just doing a per-cpu bio recycling facility, to reduce the cost
> >> of having to allocate a bio. The on-stack bio parts are fine for
> >> simple use case, where simple means that the patch just special
> >> cases the allocation, and doesn't have to change much else.
> >>
> >> I had a patch for bio recycling and batched freeing a year or two
> >> ago, I'll see if I can find and resurrect it.
> >
> > So, you want to go with per-cpu bio recycling approach to
> > remove rw_page?
> >
> > So, do you want me to hold this patchset?
>
> I don't want to hold this series up, but I do think the recycling is
> a cleaner approach since we don't need to special case anything. I
> hope I'll get some time to dust it off, retest, and post soon.

I don't know how your bio recycling works. But my worry when I heard
per-cpu bio recycling firstly is if it's not reserved pool for
BDI_CAP_SYNCHRONOUS(IOW, if it is shared by several storages),
BIOs can be consumed by slow device(e.g., eMMC) so that a bio for
fastest device(e.g., zram in embedded system) in the system can be
stucked to wait on bio until IO for slow deivce is completed.

I guess it would be a not rare case for swap device under severe
memory pressure because lots of page cache are already reclaimed when
anonymous page start to be reclaimed so that many BIOs can be consumed
for eMMC to fetch code but swap IO to fetch heap data would be stucked
although zram-swap is much faster than eMMC.
As well, time to wait to get BIO among even fastest devices is
simple waste, I guess.

To me, bio suggested by Christoph Hellwig isn't diverge current
path a lot and simple enough to change.

Anyway, I'm okay with either way if we can remove rw_page without
any regression because the maintainance of both rw_page and
make_request is rather burden for zram, too.