Re: [PATCH v4 2/3] Adds Rust Bitmap API.

From: Alice Ryhl
Date: Wed Mar 19 2025 - 12:04:21 EST


On Wed, Mar 19, 2025 at 10:31:12AM -0400, Yury Norov wrote:
> On Wed, Mar 19, 2025 at 10:39:57AM +0000, Alice Ryhl wrote:
> > On Tue, Mar 18, 2025 at 04:17:18PM -0400, Yury Norov wrote:
> > > On Tue, Mar 18, 2025 at 05:51:53PM +0000, Burak Emir wrote:
>
> [...]
>
> > > Are those 'drop' and 'new' something like a special rust words? If not -
> > > can you use alloc and free wording? Would be nice to have rust part
> > > looking similar to C. Nobody wants to keep two sets of names in mind.
> >
> > Rewording `new` to `alloc` seems reasonable.
> >
> > As for "drop", that word is special. It's the destructor that is called
> > automatically when the bitmap goes out of scope - you can't call it
> > directly. It must be called "drop".
>
> OK, then drop and new.
>
> > > > + if let Ok(nbits_u32) = u32::try_from(nbits) {
> > > > + // SAFETY: `nbits == 0` is permitted and `nbits <= u32::MAX`.
> > > > + let ptr = unsafe { bindings::bitmap_zalloc(nbits_u32, flags.as_raw()) };
> > > > + // Zero-size allocation is ok and yields a dangling pointer.
> > >
> > > Maybe it's OK, but I'm not aware of any a correct algorithm that needs
> > > 0-sized bitmaps. I already asked for it on previous iteration, right?
> > > So unless you can show me such an example and explain what for you need
> > > 0-sized bitmaps, please drop this wording.
> >
> > Thinking about it, I actually think there is a good reason to support
> > zero-sized bitmaps for the Binder use-case. Right now, we always
> > allocate at least one long worth of bits even if they're all 0. But we
> > can improve the memory usage of this code by not allocating any memory
> > for the bitmap until the first time we use it.
> >
> > The reason that dbitmap.h doesn't do this is historical. Originally, the
> > bitmap started out having BIT(0) set to 1, so we needed an allocation to
> > hold that from the very beginning. But that was changed in commit
> > 11512c197d38 ("binder: fix descriptor lookup for context manager"), so
> > the bitmap now starts out empty.
>
> Empty bitmap is not a 0-length bitmap, right?
>
> If it was me inventing dynamic bitmaps, and if I was concerned about
> usability and performance, I would think carefully what exactly the
> request for 0-bits Bitmap object means.
>
> I would probably consider it as a caller error, which makes total
> sense.
>
> Or I would consider it as a special hint, like 'give me something to
> begin with'.
>
> If I decided to go with the 2nd option, I'd probably avoid memory
> allocation at all, and re-use the pointer as the bitmap of the length
> BITS_PER_LONG. That would save _a lot_ on useless small kmalloc()
> calls.

Okay that's clever. We would probably avoid the need for allocations in
the vast vast majority of cases with that approach.

> The whole business of dynamic arrays is about potentially infinite
> number of elements. User of your array doesn't even care about the
> exact length, because it may get changed anytime, right?
>
> In that case, the comment would sound like:
>
> // Zero-size Bitmap request yields a bitmap of an arbitrary
> // non-zero length.
>
> [...]
>
> > The clippy linter emits a warning if you have a `len` method but don't
> > have an `is_empty` method, since Rust containers usually have both.
> >
> > https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty
> >
> > But the confusion with "no set bits" seems like a good reason to silence
> > that warning for bitmap.
>
> So the comment at your link says:
>
> It is good custom to have both methods, because for some data structures,
> asking about the length will be a costly operation, whereas .is_empty()
> can usually answer in constant time.
>
> In this case both len() and is_empty() are O(1), but the last one
> is totally misleading.
>
> You guys would better teach your linter to understand when the len()
> is cheap, so is_empty() is useless.

Heh, I'm not sure I agree with the lint's explanation. But all stdlib
Rust data structures have both `len` and `is_empty` even if computing
the length is O(1). In my eyes, it's just a convention, similar to
calling the constructor "new".

> > > > + /// Sets bit with index `index`.
> > > > + ///
> > > > + /// # Panics
> > > > + ///
> > > > + /// Panics if `index` is greater than or equal to `self.nbits`.
> > > > + #[inline]
> > > > + pub fn set_bit(&mut self, index: usize) {
> > >
> > > set_bit() is an atomic name, but you wire it to a non-atomic operation.
> > > This is a mess.
> >
> > Hmm. I do generally agree that we should try to match C names, but I'm
> > unsure about this particular case due to the underscores.
> >
> > This method takes "&mut self" rather than "&self", which means that the
> > caller must have exclusive access to the bitmap to call this method.
> > Attempting to call it when the bitmap is shared will result in a
> > compilation failure, so it is impossible to call the non-atomic method
> > in cases where you must use the atomic version.
>
> Does mutual access implies memory barriers? Your code doesn't add
> them. Is rust compiler so smart to realize that you need it?
>
> I mean the following scenario:
>
> CPU1 CPU2
> a.set_bit(1)
> a.test_bit(1) -> 0
> cache_flush()
> a.test_bit(1) -> 1
>
> If the above is impossible, then yes, your set_bit is atomic.

Yes, if a method is "&mut self" then this case is impossible.

Basically, the way it works is that the only APIs that hand out `&mut`
references to a value must somehow enforce that the access is exclusive.
How it enforces that is up to the API.

So for example, if you have a local variable that you know is not
shared, you can just obtain a `&mut` reference to it. No barriers needed
since it's a local variable that is not shared.

On the other hand, if you have a variable protected by a spinlock, then
the spinlock API will only hand out `&mut` references when the spinlock
is locked. And the borrow-checker enforces that the `&mut` can't be used
after you unlock it again. In this case, the spin_lock / spin_unlock
calls are sufficient barriers to use __set_bit.

And so on for each synchronization method.

Point is, any time you have a `&mut` reference to a value, you know that
it's exclusive, because whichever API created that reference has some
mechanism to ensure that.

> > We could call this method __set_bit, but using underscore prefixes for
> > methods is very very rare in Rust code because prefixing a name with an
> > underscore usually means "do not emit warnings if this thing is unused".
> > For example, when locking a mutex, you might write this:
> >
> > {
> > let _lock = my_mutex.lock();
> > // do stuff ...
> >
> > // _lock goes out of scope here, unlocking the mutex
> > }
> >
> > Here, if you called the variable "lock" you would get an unused variable
> > warning, but prefixing the variable name with an underscore silences
> > that warning.
>
> But underscored method is not a local underscored variable. It doesn't
> have scope.

If a method isn't public, then Rust will perform a similar check to the
entire compilation unit and emit a warning if it's unused. The
underscore silences that.

Though as Miguel points out, this particular method is public, so the
check is not performed - there might be callers in other compilation
units that this rustc invocation doesn't know about.

> > We can still call the method __set_bit if you think that is best, but
>
> No I don't. Nobody likes underscored notation for non-atomic bit ops,
> but it's a historical thing.
>
> > because underscore prefixes usually mean something different in Rust, I
> > wonder if we should use slightly different names in Rust. Thoughts?
>
> This is a new project, and you may decide to change notation. Just
> please be very specific about it.
>
> So I'd suggest the following:
>
> 1. Mirror C interface: __set_bit() and set_bit()
> 2. set_bit_nonatomic() and set_bit()
> 3. set_bit() and set_bit_atomic()
>
> The last one looks the best to me, except that it just the opposite
> to what people got used to in Linux. If you go with #3, please add
> both atomic and non-atomic set_bit()s in the same patch, together with
> a bunch of comments, so people will be aware.

Solution #3 sounds good to me.

Alice