Re: bcache on XFS: metadata I/O (dirent I/O?) not getting cached at all?

From: Coly Li
Date: Thu Feb 07 2019 - 03:16:59 EST


On 2019/2/7 6:11 äå, Nix wrote:
> So I just upgraded to 4.20 and revived my long-turned-off bcache now
> that the metadata corruption leading to mount failure on dirty close may
> have been identified (applying Tang Junhui's patch to do so)... and I
> spotted something a bit disturbing. It appears that XFS directory and
> metadata I/O is going more or less entirely uncached.
>
> Here's some bcache stats before and after a git status of a *huge*
> uncached tree (Chromium) on my no-writeback readaround cache. It takes
> many minutes and pounds the disk with massively seeky metadata I/O in
> the process:
>
> Before:
>
> stats_total/bypassed: 48.3G
> stats_total/cache_bypass_hits: 7942
> stats_total/cache_bypass_misses: 861045
> stats_total/cache_hit_ratio: 3
> stats_total/cache_hits: 16286
> stats_total/cache_miss_collisions: 25
> stats_total/cache_misses: 411575
> stats_total/cache_readaheads: 0
>
> After:
> stats_total/bypassed: 49.3G
> stats_total/cache_bypass_hits: 7942
> stats_total/cache_bypass_misses: 1154887
> stats_total/cache_hit_ratio: 3
> stats_total/cache_hits: 16291
> stats_total/cache_miss_collisions: 25
> stats_total/cache_misses: 411625
> stats_total/cache_readaheads: 0
>
> Huge increase in bypassed reads, essentially no new cached reads. This
> is... basically the optimum case for bcache, and it's not caching it!
>
> From my reading of xfs_dir2_leaf_readbuf(), it looks like essentially
> all directory reads in XFS appear to bcache as a single non-readahead
> followed by a pile of readahead I/O: bcache bypasses readahead bios, so
> all directory reads (or perhaps all directory reads larger than a single
> block) are going to be bypassed out of hand.
>
> This seems... suboptimal, but so does filling up the cache with
> read-ahead blocks (particularly for non-metadata) that are never used.
> Anyone got any ideas, 'cos I'm currently at a loss: XFS doesn't appear
> to let us distinguish between "read-ahead just in case but almost
> certain to be accessed" (like directory blocks) and "read ahead on the
> offchance because someone did a single-block file read and what the hell
> let's suck in a bunch more".
>
> As it is, this seems to render bcache more or less useless with XFS,
> since bcache's primary raison d'etre is precisely to cache seeky stuff
> like metadata. :(
>
Hi Nix,

Could you please to try whether the attached patch makes things better ?

Thanks in advance for your help.

--

Coly Li
From 7c27e26017f6297a6bc6a8075732f69d3edcc52e Mon Sep 17 00:00:00 2001
From: Coly Li <colyli@xxxxxxx>
Date: Thu, 7 Feb 2019 15:54:24 +0800
Subject: [PATCH] bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata

In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for
metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio.
This assumption is not always correct, e.g. XFS uses REQ_META to mark
metadata bio other than REQ_PRIO. This is why Nix reports a regression
that bcache does not cache metadata for XFS after the above commit.

Thanks to Dave Chinner, he explains the difference between REQ_META and
REQ_PRIO from view of file system developer. Here I quote part of his
explanation from mailing list,
REQ_META is used for metadata. REQ_PRIO is used to communicate to
the lower layers that the submitter considers this IO to be more
important that non REQ_PRIO IO and so dispatch should be expedited.

IOWs, if the filesystem considers metadata IO to be more important
that user data IO, then it will use REQ_PRIO | REQ_META rather than
just REQ_META.

Then it seems bios with REQ_META or REQ_PRIO should both be cached for
performance optimation, because they are all probably low I/O latency
demand by upper layer (e.g. file system).

So in this patch, when we want to check whether a bio is metadata
related, REQ_META and REQ_PRIO are both checked. Then both metadata and
high priority I/O requests will be handled properly.

Reported-by: Nix <nix@xxxxxxxxxxxxx>
Signed-off-by: Coly Li <colyli@xxxxxxx>
Cc: Dave Chinner <david@xxxxxxxxxxxxx>
Cc: Andre Noll <maan@xxxxxxxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
---
drivers/md/bcache/request.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 3bf35914bb57..62bda90a38dc 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -395,7 +395,7 @@ static bool check_should_bypass(struct cached_dev *dc, struct bio *bio)
* unless the read-ahead request is for metadata (eg, for gfs2 or xfs).
*/
if (bio->bi_opf & (REQ_RAHEAD|REQ_BACKGROUND) &&
- !(bio->bi_opf & REQ_PRIO))
+ !(bio->bi_opf & (REQ_META|REQ_PRIO)))
goto skip;

if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) ||
@@ -877,7 +877,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
}

if (!(bio->bi_opf & REQ_RAHEAD) &&
- !(bio->bi_opf & REQ_PRIO) &&
+ !(bio->bi_opf & (REQ_META|REQ_PRIO)) &&
s->iop.c->gc_stats.in_use < CUTOFF_CACHE_READA)
reada = min_t(sector_t, dc->readahead >> 9,
get_capacity(bio->bi_disk) - bio_end_sector(bio));
--
2.16.4