Re: [PATCH] binder: use augmented rb-tree for faster descriptor lookup

From: Carlos Llamas
Date: Thu Oct 10 2024 - 00:22:30 EST


On Wed, Oct 09, 2024 at 11:08:42PM +0800, Shu Han wrote:
> On Wed, Oct 9, 2024 at 4:58 AM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote:
>
> Thank you for your patient reply.
>
> > I honestly don't remember. It might have been that we required to expand
> > the 'struct binder_ref' size. This issue starts to be a problem when we
> > have thousands of references e.g. 30,000. Adding 4 bytes to each one of
> > them might not be worth it. But let me check...
>
> I didn't consider memory overhead before...
> That's indeed an important point.

Yeah, I actually started with a solution that kept a list of references
that had "gaps" behind them. This also required expanding struct
binder_ref and I eventually abandoned this idea.

>
> Fortunately, after checking, I found that this patch does not occupy any
> additional memory for `struct binder_ref`.

This is good news. I hadn't actually checked this. I'll keep this in
mind.

>
> In a modern 64-bit platform, the size of `struct binder_ref` is 104 bytes.
> If we add a 4-byte field into it, the size will be 112 bytes(aligned by
> long). And both of them occupy a 128-byte slab(aligned by kmalloc_index).
> So the memory cost won't be increased, if there isn't a pending change
> that needs exactly 24 bytes in `struct binder_ref`.
>
> In Rust, a rbtree::Node costs 40 bytes, 4 of 40 are used for alignment,
> adding a 4-byte field won't change the size of the struct.
>
> In a 32-bit platform, the number is from 60 bytes to 64 bytes, a 64-byte
> slab, exactly 4 bytes.(very close to the slab bound)
>
> Perhaps it's caused by introducing augmented rbtree into Rust which
> requires considerable effort. But personally, I think it's not bad to just
> introduce augmented rbtree into C (Rust and C are already quite different
> here).
>
> > Yeah, I think it would look cleaner if we do a revert of the previous
> > patches though. This way we can remove the noise and see the actual
> > changes. I'll do this locally for now, no need to send a v2 just yet.
>
> Great point. thanks.
>
> Best regards.

I ran a benchmark test that creates multiple references on a Pixel
device and the numbers between the augmented rbtree implementation and
the current dbitmap are pretty much the same:

augmented rbtree:
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
BM_collectProxies/0/10 696251 ns 334494 ns 2363 kernel
BM_collectProxies/0/100 4047417 ns 1886942 ns 390 kernel
BM_collectProxies/0/1000 29510599 ns 14827312 ns 51 kernel
BM_collectProxies/0/5000 136774414 ns 70482303 ns 9 kernel
BM_collectProxies/0/10000 248333277 ns 125898564 ns 5 kernel
BM_collectProxies/0/20000 485156508 ns 245891699 ns 3 kernel

dbitmap:
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
BM_collectProxies/0/10 646427 ns 291657 ns 1935 kernel
BM_collectProxies/0/100 4203230 ns 2005648 ns 484 kernel
BM_collectProxies/0/1000 30859725 ns 15369365 ns 42 kernel
BM_collectProxies/0/5000 147910844 ns 75179017 ns 9 kernel
BM_collectProxies/0/10000 239196875 ns 121801066 ns 5 kernel
BM_collectProxies/0/20000 481154229 ns 247742285 ns 3 kernel

You should have this test avialable as 'binderRpcBenchmark' I ran with
the following parameters if you want to play with it:
$ binderRpcBenchmark --benchmark_filter="BM_collectProxies/0/*"

Even if the numbers are too close it seems the bump in memory might be a
problem. Particularly in 32bit as these devices are more sensitive to
increases in allocations.

Regards,
Carlos Llamas