Re: [PATCH 03/21] bcachefs: btree write buffer knows how to accumulate bch_accounting keys

From: Kent Overstreet
Date: Thu Feb 29 2024 - 15:25:24 EST


On Thu, Feb 29, 2024 at 01:44:07PM -0500, Brian Foster wrote:
> On Wed, Feb 28, 2024 at 05:42:39PM -0500, Kent Overstreet wrote:
> > Shouldn't be any actual risk. It's just new accounting updates that the
> > write buffer can't flush, and those are only going to be generated by
> > interior btree node updates as journal replay has to split/rewrite nodes
> > to make room for its updates.
> >
> > And for those new acounting updates, updates to the same counters get
> > accumulated as they're flushed from the journal to the write buffer -
> > see the patch for eytzingcer tree accumulated. So we could only overflow
> > if the number of distinct counters touched somehow was very large.
> >
> > And the number of distinct counters will be growing significantly, but
> > the new counters will all be for user data, not metadata.
> >
> > (Except: that reminds me, we do want to add per-btree counters, so users
> > can see "I have x amount of extents, x amount of dirents, etc.).
> >
>
> Heh, Ok. This all does sound a little open ended to me. Maybe the better
> question is: suppose this hypothetically does happen after adding a
> bunch of new counters, what would the expected side effect be in the
> recovery scenario where the write buffer can't be flushed?

The btree write buffer buf is allowed to grow - we try to keep it
bounded in normal operation, but that's one of the ways we deal with the
unpredictability of the amount of write buffer keys in the journal.

So it'll grow until that kvrealloc fails. It won't show up as a
deadlock, it'll show up as an allocation failure; and for that to
mappen, that would mean the number of accounting keys being update - not
the number of accounting updates, just the number of distinct keys being
updated - is no longer fitting in the write buffer.