Re: [PATCH 01/21] bcachefs: KEY_TYPE_accounting

From: Kent Overstreet
Date: Thu Feb 29 2024 - 16:25:16 EST


On Thu, Feb 29, 2024 at 01:43:15PM -0500, Brian Foster wrote:
> Hmm.. I think the connection I missed on first look is basically
> disk_accounting_key_to_bpos(). I think what is confusing is that calling
> this a key makes me think of bkey, which I understand to contain a bpos,
> so then overlaying it with a bpos didn't really make a lot of sense to
> me conceptually.
>
> So when I look at disk_accounting_key_to_bpos(), I see we are actually
> using the bpos _pad field, and this structure basically _is_ the bpos
> for a disk accounting btree bkey. So that kind of makes me wonder why
> this isn't called something like disk_accounting_pos instead of _key,
> but maybe that is wrong for other reasons.

hmm, I didn't consider calling it disk_accounting_pos. I'll let that
roll around in my brain.

'key' is more standard terminology to me outside bcachefs, but 'pos'
does make more sense within bcachefs.

> Either way, what I'm trying to get at is that I think this documentation
> would be better if it explained conceptually how disk_accounting_key
> relates to bkey/bpos, and why it exists separately from bkey vs. other
> key types, rather than (or at least before) getting into the lower level
> side effects of a union with bpos.

Well, that gets into some fun territory - ideally bpos would not be a
fixed thing that every btree was forced to use, we'd be able to define
different types per btree.

And we're actually going to need to be able to do that in order to do
configurationless autotiering - i.e. tracking how hot/cold data is on an
inode:offset basis, because LRU btree backreferences need to go in the
key (bpos), not the value, in order to avoid collisions, and bpos isn't
big enough for that.

disk_accounting_(key|pos) is an even trickier situation, because of
endianness issues. The trick we do with bpos of defining the field order
differently based on endianness so that byte order matches word order -
that really wouldn't work here, so there is at present no practical way
that I know of to avoid the byte swabbing when going back and forth
between bpos and disk_accounting_pos on big endian.

But gcc does have an attribute now that lets you specify that an integer
struct member is big or little endian... I if we could get them to go
one step further and give us an attribute to control whether members are
laid out in ascending or descending order...