Re: [PATCH] Revert "bcachefs: Add asserts to bch2_dev_btree_bitmap_marked_sectors()"
From: John Stoffel
Date: Fri Oct 25 2024 - 17:26:39 EST
>>>>> "Kent" == Kent Overstreet <kent.overstreet@xxxxxxxxx> writes:
> On Thu, Oct 24, 2024 at 02:30:34PM -0400, John Stoffel wrote:
>> >>>>> "Kent" == Kent Overstreet <kent.overstreet@xxxxxxxxx> writes:
>>
>> > On Mon, Oct 21, 2024 at 10:18:57PM +0530, Manas via B4 Relay wrote:
>> >> From: Manas <manas18244@xxxxxxxxxxx>
>> >>
>> >> This reverts commit 60f2b1bcf519416dbffee219132aa949d0c39d0e.
>> >>
>> >> This syzbot bug[1] is triggered due to the BUG_ON assertions added in
>> >> __bch2_dev_btree_bitmap_mark. During runtime, m->btree_bitmap_shift is
>> >> 63 '?'. This triggers both the assertions.
>>
>> > The BUG_ON() doesn't need to be deleted; we just need to fix the
>> > validation so it doesn't fire (it doesn't particularly matter if it's
>> > removed or not, ubsan would catch it without the BUG_ON()).
>>
>> Shouldn't the BUG_ON() be replaced with making the filesystem readonly
>> instead if you're going to keep it? I'd rather have the filesystem
>> still be mounted and able to be read, but not writeable, instead of
>> having my system crash before I can do anything.
> Not in this case. In general, if there's a chance of the BUG_ON()
> hitting in the wild after the code has passed testing then it
> shouldn't be a BUG_ON(), but this is low level validation that we're
> relying on.
So I'm having a hard time parsing this reply. And I don't think you
make a good case for leaving or even having a BUG_ON() here at all.
If there's a chance of it hitting in the wild, it should be removed.
But you don't want to remove it because it will never hit? That's
just lazy... :-) I just don't see why a filesystem should have the
opportunity to kill an entire system because that one filesystem has
problems.
> In general higher level code absolutely requires that the low level
> validation is correct, because if it's not that will trigger all sorts
> of undefined behaviour in the upper layers.
> By "low level validation" I mean _only_ the validate functions that
> check simple invariants within a single data type that is read or
> written atomically to disk - "is data type garbage or not".
Sure.
> If it's an invariant that involves multiple objects - "does this
> extent refer to a valid device/snapshot ID" - that's not something
> we can check in validate, because then an extent or what have you
> would become invalid depending on what happens in the rest of the
> filesystem. Those sorts of checks do in fact need proper error
> paths.
Right.