Re: [PATCH 1/1] hfsplus: switch hfs_bmap_alloc() to next-fit allocation using alloc_hint

From: Viacheslav Dubeyko

Date: Tue Apr 07 2026 - 15:08:59 EST


On Mon, 2026-04-06 at 11:00 +0530, Kalpan Jani wrote:
> hfs_bmap_alloc() currently performs a linear scan starting from bit 0 of
> the allocation bitmap on every call. This rescans already-used regions
> at the front of the bitmap, causing unnecessary overhead that grows with
> filesystem size and fragmentation.
>
> Introduce a next-fit allocation strategy by caching the last successful
> allocation position in a new alloc_hint field in struct hfs_btree.
> Allocation begins from alloc_hint instead of zero and wraps around when
> the end of the map chain is reached, so the full bitmap is still
> searched before returning -ENOSPC.
>
> The map chain extension path (hfs_bmap_new_bmap) is preserved: when the
> scan reaches the end of the chain with idx < node_count, a new map node
> is created before any wrap-around, ensuring that nodes reserved by
> hfs_bmap_reserve() are always reachable. Bounds checking against
> tree->node_count prevents allocation beyond valid filesystem boundaries.
>
> The seek phase that advances to the map node containing alloc_hint uses
> the existing hfs_bmap_get_map_page() API for node-type validation and
> offset checking, rather than open-coding hfs_brec_lenoff() calls. As a
> minor API refinement, hfs_bmap_get_map_page() now sets ctx->len to the
> remaining bytes from the requested byte_offset rather than the total
> record length, eliminating the need for callers to adjust it externally.
>
> hfs_bmap_free() pulls alloc_hint back when a node is freed below the
> current hint, so the freed slot is found immediately on the next
> allocation without requiring a full wrap-around scan.
>
> No on-disk format changes.
>
> Link: https://lore.kernel.org/all/7194aa49efdb85c7cfc9578f1460aaa9a1c67095.camel@xxxxxxxxxxxxxxxxxx/
> Co-developed-by: Shardul Bankar <shardul.b@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Shardul Bankar <shardul.b@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Kalpan Jani <kalpan.jani@xxxxxxxxxxxxxxxxxx>
> ---
> fs/hfsplus/btree.c | 114 +++++++++++++++++++++++++++++++++++-----
> fs/hfsplus/hfsplus_fs.h | 1 +
> 2 files changed, 101 insertions(+), 14 deletions(-)
>
> diff --git a/fs/hfsplus/btree.c b/fs/hfsplus/btree.c
> index 04304f952f6b..6f2448e332f3 100644
> --- a/fs/hfsplus/btree.c
> +++ b/fs/hfsplus/btree.c
> @@ -132,7 +132,7 @@ u32 hfsplus_calc_btree_clump_size(u32 block_size, u32 node_size,
> /* Context for iterating b-tree map pages
> * @page_idx: The index of the page within the b-node's page array
> * @off: The byte offset within the mapped page
> - * @len: The remaining length of the map record
> + * @len: The remaining bytes of the map record from the requested offset
> */
> struct hfs_bmap_ctx {
> unsigned int page_idx;
> @@ -178,6 +178,8 @@ static struct page *hfs_bmap_get_map_page(struct hfs_bnode *node, struct hfs_bma
> if (byte_offset >= ctx->len)
> return ERR_PTR(-EINVAL);
>
> + ctx->len -= byte_offset;
> +

It looks really dangerous out of context. Why do we need to modify the ctx->len
in hfs_bmap_get_map_page()? Is it right place? It looks like really good way to
introduce really not trivial bugs.

> page_off = (u32)off16 + node->page_offset + byte_offset;
> ctx->page_idx = page_off >> PAGE_SHIFT;
> ctx->off = page_off & ~PAGE_MASK;
> @@ -525,19 +527,29 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
> struct hfs_bnode *node, *next_node;
> struct hfs_bmap_ctx ctx;
> struct page *page;
> - u32 nidx, idx;
> + u32 start, target, nidx, idx;
> + u32 bit_offset, found;
> u8 *data, byte, m;
> int i, res;
> + bool wrapped = false;

I am not really like this wrapped approach. :) But, please, check my comments
related to alloc_hint. Potentially, we don't need in wrapped variable at all.

>
> res = hfs_bmap_reserve(tree, 1);
> if (res)
> return ERR_PTR(res);
>
> + if (tree->alloc_hint >= tree->node_count)
> + tree->alloc_hint = 0;
> +
> + start = tree->alloc_hint;

Please, see my comment related to alloc_hint below.

> +
> nidx = 0;
> node = hfs_bnode_find(tree, nidx);
> if (IS_ERR(node))
> return node;
>
> + target = start;

Why do we need to introduce the target variable at all?

> +
> + /* Seek to the map node containing the target bit */
> page = hfs_bmap_get_map_page(node, &ctx, 0);
> if (IS_ERR(page)) {
> res = PTR_ERR(page);
> @@ -545,27 +557,82 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
> return ERR_PTR(res);
> }
>
> + while (target >= (u32)ctx.len * BITS_PER_BYTE) {
> + target -= (u32)ctx.len * BITS_PER_BYTE;

Why do we decrement target?

> + nidx = node->next;
> + if (!nidx) {
> + /* target is out of bounds, reset to start of map */
> + target = 0;
> + start = 0;
> + hfs_bnode_put(node);
> + node = hfs_bnode_find(tree, HFSPLUS_TREE_HEAD);
> + if (IS_ERR(node))
> + return node;
> + break;
> + }
> +
> + hfs_bnode_put(node);
> + node = hfs_bnode_find(tree, nidx);
> + if (IS_ERR(node))
> + return node;
> +
> + page = hfs_bmap_get_map_page(node, &ctx, 0);
> + if (IS_ERR(page)) {
> + res = PTR_ERR(page);
> + hfs_bnode_put(node);
> + return ERR_PTR(res);
> + }
> + }

It looks like a specialized function.

I don't quite follow to purpose of this loop. Are we trying to find the map node
here? But every node has fixed size bitmap and we can simply calculate the node
index and, then, get it on this basis. Am I right? I assume this logic takes
place because we need to follow node->next. But if we know the index in the map
nodes chain, then we simply need pass by the several nodes by means of node-
>next without this logic. However, I think you selected simpler logic and,
anyway, you need to get the node to access the header. So, maybe, your approach
is not bad. I believe we need to have it as static inline function.

> +
> + /* Position within the target node at the exact byte */
> + page = hfs_bmap_get_map_page(node, &ctx, target / BITS_PER_BYTE);
> + if (IS_ERR(page)) {
> + res = PTR_ERR(page);
> + hfs_bnode_put(node);
> + return ERR_PTR(res);
> + }
> +
> data = kmap_local_page(page);
> - idx = 0;
> + idx = (start / 8) * 8;
> + bit_offset = target % 8;

I really dislike the hardcoded constants. Do you mean BITS_PER_BYTE here?

>
> for (;;) {
> - while (ctx.len) {
> + while (ctx.len && idx < tree->node_count) {
> byte = data[ctx.off];
> if (byte != 0xff) {
> - for (m = 0x80, i = 0; i < 8; m >>= 1, i++) {
> + /* After wrapping, only check bits before the start position */
> + int end_bit = (wrapped && idx == (start / 8) * 8) ? (start % 8) : 8;

It's definitely longer than 80 symbols per line.

The same comment about hardcoded constants.

I think this logic needs to be in static inline function.

> +
> + for (i = bit_offset, m = 0x80 >> bit_offset;
> + i < end_bit; i++, m >>= 1) {

Could we make this more simple and understandable?

Why m = 0x80 >> bit_offset and m >>= 1 not inside the loop's body?

I prefer to see first and second code patterns as functions of macros. It will
make the code much easier to follow.

> if (!(byte & m)) {
> - idx += i;
> + found = idx + i;
> +
> + /* past valid node range */
> + if (found >= tree->node_count)
> + break;
> +
> data[ctx.off] |= m;
> set_page_dirty(page);
> kunmap_local(data);
> tree->free_nodes--;
> mark_inode_dirty(tree->inode);
> hfs_bnode_put(node);
> - return hfs_bnode_create(tree,
> - idx);
> +
> + tree->alloc_hint = (found + 1) % tree->node_count;

Please, see my comment related to alloc_hint.

> + return hfs_bnode_create(tree, found);
> }
> }
> }
> +
> + /* Terminate precisely if we've checked the start byte after wrapping */
> + if (wrapped && idx == (start / 8) * 8) {

The same comment about wrapped and hardcoded constants.

> + kunmap_local(data);
> + hfs_bnode_put(node);
> + return ERR_PTR(-ENOSPC);
> + }
> +
> + bit_offset = 0;
> if (++ctx.off >= PAGE_SIZE) {
> kunmap_local(data);
> page = node->page[++ctx.page_idx];
> @@ -577,16 +644,32 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
> }
> kunmap_local(data);
> nidx = node->next;
> +
> if (!nidx) {
> - hfs_dbg("create new bmap node\n");
> - next_node = hfs_bmap_new_bmap(node, idx);
> - } else
> + if (idx < tree->node_count) {
> + /* Extend the map chain to cover newly reserved nodes */
> + hfs_dbg("create new bmap node\n");
> + next_node = hfs_bmap_new_bmap(node, idx);
> + } else if (!wrapped) {
> + /* Chain exhausted, wrap to scan [0, start) */
> + wrapped = true;
> + idx = 0;
> + bit_offset = 0;
> + next_node = hfs_bnode_find(tree, HFSPLUS_TREE_HEAD);
> + } else {
> + hfs_bnode_put(node);
> + return ERR_PTR(-ENOSPC);
> + }
> + } else {
> next_node = hfs_bnode_find(tree, nidx);
> + }
> +
> hfs_bnode_put(node);
> +
> if (IS_ERR(next_node))
> return next_node;
> - node = next_node;
>
> + node = next_node;
> page = hfs_bmap_get_map_page(node, &ctx, 0);
> if (IS_ERR(page)) {
> res = PTR_ERR(page);
> @@ -601,13 +684,14 @@ void hfs_bmap_free(struct hfs_bnode *node)
> {
> struct hfs_btree *tree;
> u16 off, len;
> - u32 nidx;
> + u32 nidx, node_id;
> int res;
>
> hfs_dbg("node %u\n", node->this);
> BUG_ON(!node->this);
> tree = node->tree;
> - nidx = node->this;
> + node_id = node->this;
> + nidx = node_id;
> node = hfs_bnode_find(tree, 0);
> if (IS_ERR(node))
> return;
> @@ -647,6 +731,8 @@ void hfs_bmap_free(struct hfs_bnode *node)
> } else if (!res) {
> tree->free_nodes++;
> mark_inode_dirty(tree->inode);
> + if (node_id < tree->alloc_hint)
> + tree->alloc_hint = node_id;
> }
>
> hfs_bnode_put(node);
> diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
> index 5f891b73a646..170ef2644204 100644
> --- a/fs/hfsplus/hfsplus_fs.h
> +++ b/fs/hfsplus/hfsplus_fs.h
> @@ -49,6 +49,7 @@ struct hfs_btree {
> u32 leaf_tail;
> u32 node_count;
> u32 free_nodes;
> + u32 alloc_hint;

Do we really need to introduce the alloc_hint? Could we use node_count instead?
Even if we've freed some nodes in the range lower than node_count, then does it
really makes sense to search in the lower region? If we have a lot of
free_nodes, then, probably, yes. But if we have low number of free_nodes, then
the search in lower region will be still inefficient. If we have mane
free_nodes, then we can simply start the search from the beginning. Otherwise,
it doesn't make sense to try.

Thanks,
Slava.

> u32 attributes;
>
> unsigned int node_size;