Re: API break, sysfs "capability" file

From: Lennart Poettering
Date: Tue Apr 09 2024 - 04:23:27 EST


On Mo, 08.04.24 16:41, Keith Busch (kbusch@xxxxxxxxxx) wrote:

> On Mon, Apr 08, 2024 at 10:23:49PM +0200, Lennart Poettering wrote:
> > Not sure how this is salvageable. This is just seriously fucked
> > up. What now?
> >
> > It has been proposed to use the "range_ext" sysfs attr instead as a
> > hint if partition scanning is available or not. But it's entirely
> > undocumented. Is this something that will remain stable? (I mean,
> > whether something is documented or not apparently has no effect on the
> > stability of an API anyway, so I guess it's equally shaky as the
> > capability sysattr? Is any of the block device sysfs interfaces
> > actually stable or can they change any time?)
>
> The "ext_range" attribute does look like an appropriate proxy for the
> attribute, but indeed, it's not well documented.
>
> Looking at the history of the documentation you had been relying on, it
> appears that was submitted with good intentions (9243c6f3e012a92d), but
> it itself changed values, acknowledging the instability of this
> interface.
>
> So what to do? If documentation is all that's preventing "ext_range"
> from replacing you're previous usage, then let's add it in the
> Documentation/ABI/stable/sysfs-block. It's been there since 2008, so
> that seems like a reliable attribute to put there.

Well, history so far is telling us that this doesn't stop the block layer
to change it anyway...

AFAICS "ext_range" is kinda messy to use for this since it changed
behaviour – only since
https://github.com/torvalds/linux/commit/1ebe2e5f9d68e94c524aba876f27b945669a7879
it actually directly exposes GENHD_FL_NO_PART, before it it did some
more complex stuff which did *not* take GENHD_FL_NO_PART into
consideration. It's nasty to hack against that from userspace, since
we never know on what kernel we are on, and how it has been patched.

Also "ext_range" is only available on whole block devices afaics. Partition
block devices do not have it at all, which makes the check userspace
has to do even more complex.

All I am looking for is a very simple test that returns me a boolean:
is there kernel-level partition scanning enabled on this device or
not. At this point it's not clear to me if I can write this at all in
a way that works reasonably correctly on any kernel since let's say
4.15 (which is systemd's "recommended baseline" right now).

I am really not sure how to salvage this mess at all. AFAICS there's
currently no way to write such a test correctly.

1. "ext_range" does not work on older kernels, and not on partition
block devices
2. "capabilities" does not work on newer kernels, because it changed
meaning and then was amputated to be zero.
3. There's no way to know if we are on an old or new kernel, as
apparently various distros backported the amputation.

So, what now?

I think it would be nice if the "capabilities" thing would be brought
back in a limited form. For example, if it would be changed to start
to return 0x200|0x1000 for part scanning is off, 0x1000 when it is on.

That would then mean we return to compatibility with Linux <= 5.15,
but the new 0x1000 bit would tell us that the information is
reliable. i.e. if userspace sees 0x1000 being set we know that the
0x200 bit is definitely correct. That would then just mean that
kernels >= 5.16 until today are left in the cold...

That would then allow userspace to implement:

1. if "capabilities" has 0x200 set → definitely no partition scanning
2. if "capabilities" has 0x1000 set → bit 0x200 reliably tells is
whether partition scanning on or off
3. if DEVTYPE=partition → definitely no partition scanning
4. if "ext_range" is 1 → definitely no partition scanning
5. if LOOP_GET_STATUS64 works, then .lo_flags' LO_FLAGS_PARTSCAN flag
indicates partition scanning on or off.
6. otherwise: ??? (probably we should assume partition scanning is on?)

Lennart

--
Lennart Poettering, Berlin