Re: API break, sysfs "capability" file

From: Lennart Poettering
Date: Mon Apr 08 2024 - 16:24:03 EST


On Mo, 08.04.24 12:41, Keith Busch (kbusch@xxxxxxxxxx) wrote:

> On Mon, Apr 08, 2024 at 07:43:04PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > [adding the culprit's author to the loop; also CCing everyone else in
> > the Signed-off-by chain and a few lists that should be in the loop, too]
> >
> > On 08.04.24 17:13, Lennart Poettering wrote:
> > >
> > > So this broke systemd:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e81cd5a983bb35dabd38ee472cf3fea1c63e0f23
> >>
> > > We use the "capability" sysfs attr to figure out if a block device has
> > > part scanning enabled or not. There seems to be no other API for
> > > this. (We also use it in our test suite to see if devices match are
> > > expectations, and older systemd/udev versions used to match agains it
> > > from udev rules.)
> > >
> > > The interface was part of sysfs, and documented:
> > >
> > > https://www.kernel.org/doc/html/v5.5/block/capability.html

I linked to the wrong docs btw: here is the right one:

https://www.kernel.org/doc/html/v5.15/block/capability.html

Quoting:

This file documents the sysfs file block/<disk>/capability.

capability is a bitfield, printed in hexadecimal, indicating
which capabilities a specific block device supports:



GENHD_FL_NO_PART_SCAN (0x0200): partition scanning is disabled. Used
for loop devices in their default settings and some MMC devices.

We used the flag 0x200 in systemd. i.e. followed the docs, and it
worked back when we added this.

> > > While it doesn't list the partscan bit it actually does document that
> > > one is supposed to look into include/linux/genhd.h for the various
> > > bits and their meanings. I'd argue that makes them API to some level.
> > >
> > > Could this please be reverted? Just keeping the relevant bits (i.e. at
> > > least the media change feature bit, and the part scanning bit) is
> > > enough for retaining userspace compat.
> > >
> > > (Please consider googling or a github code search or so before removing
> > > a public API like this. This compat breakage was very much avoidable
> > > with a tiny bit of googling.)
>
> That is unfortunate wording in the sysfs description.
>
> How were keeping in sync with the changing values before? The setting
> you seem to care about is now defined in a different file, with a
> different name, and with a different value. Or are you suggesting all
> those things should have been stable API?

Ah, so this ws already broken once
before... 430cc5d3ab4d0ba0bd011cfbb0035e46ba92920c

So the documented API was not just broken completely once but twice?

Good to know.

1. GENHD_FL_NO_PART(_SCAN) was originally documented as 0x200.

2. GENHD_FL_MEDIA_CHANGE_NOTIFY was originally documented as 4.

3. And then GENHD_FL_NO_PART(_SCAN) got redefined to 4 in
430cc5d3ab4d0ba0bd011cfbb0035e46ba92920c.

All that even though this is documented API which is still up on
kernel.org?

Not sure how this is salvageable. This is just seriously fucked
up. What now?

It has been proposed to use the "range_ext" sysfs attr instead as a
hint if partition scanning is available or not. But it's entirely
undocumented. Is this something that will remain stable? (I mean,
whether something is documented or not apparently has no effect on the
stability of an API anyway, so I guess it's equally shaky as the
capability sysattr? Is any of the block device sysfs interfaces
actually stable or can they change any time?)

Lennart

--
Lennart Poettering, Berlin