Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()

From: Etienne Carriere
Date: Fri Dec 10 2021 - 05:39:34 EST


On Fri, 10 Dec 2021 at 11:29, Sumit Garg <sumit.garg@xxxxxxxxxx> wrote:
>
> On Fri, 10 Dec 2021 at 15:08, Etienne Carriere
> <etienne.carriere@xxxxxxxxxx> wrote:
> >
> > Hello all,
> >
> > On Fri, 10 Dec 2021 at 09:10, Jerome Forissier <jerome@xxxxxxxxxxxxx> wrote:
> > >
> > > +CC Jens, Etienne
> > >
> > > On 12/10/21 06:00, Sumit Garg wrote:
> > > > On Fri, 10 Dec 2021 at 09:42, Wang, Xiaolei <Xiaolei.Wang@xxxxxxxxxxxxx> wrote:
> > > >>
> > > >> -----Original Message-----
> > > >> From: Sumit Garg <sumit.garg@xxxxxxxxxx>
> > > >> Sent: Thursday, December 9, 2021 7:41 PM
> > > >> To: Wang, Xiaolei <Xiaolei.Wang@xxxxxxxxxxxxx>
> > > >> Cc: jens.wiklander@xxxxxxxxxx; op-tee@xxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > > >> Subject: Re: [PATCH] optee: Suppress false positive kmemleak report in optee_handle_rpc()
> > > >>
> > > >> [Please note: This e-mail is from an EXTERNAL e-mail address]
> > > >>
> > > >> On Mon, 6 Dec 2021 at 17:35, Xiaolei Wang <xiaolei.wang@xxxxxxxxxxxxx> wrote:
> > > >>>
> > > >>> We observed the following kmemleak report:
> > > >>> unreferenced object 0xffff000007904500 (size 128):
> > > >>> comm "swapper/0", pid 1, jiffies 4294892671 (age 44.036s)
> > > >>> hex dump (first 32 bytes):
> > > >>> 00 47 90 07 00 00 ff ff 60 00 c0 ff 00 00 00 00 .G......`.......
> > > >>> 60 00 80 13 00 80 ff ff a0 00 00 00 00 00 00 00 `...............
> > > >>> backtrace:
> > > >>> [<000000004c12b1c7>] kmem_cache_alloc+0x1ac/0x2f4
> > > >>> [<000000005d23eb4f>] tee_shm_alloc+0x78/0x230
> > > >>> [<00000000794dd22c>] optee_handle_rpc+0x60/0x6f0
> > > >>> [<00000000d9f7c52d>] optee_do_call_with_arg+0x17c/0x1dc
> > > >>> [<00000000c35884da>] optee_open_session+0x128/0x1ec
> > > >>> [<000000001748f2ff>] tee_client_open_session+0x28/0x40
> > > >>> [<00000000aecb5389>] optee_enumerate_devices+0x84/0x2a0
> > > >>> [<000000003df18bf1>] optee_probe+0x674/0x6cc
> > > >>> [<000000003a4a534a>] platform_drv_probe+0x54/0xb0
> > > >>> [<000000000c51ce7d>] really_probe+0xe4/0x4d0
> > > >>> [<000000002f04c865>] driver_probe_device+0x58/0xc0
> > > >>> [<00000000b485397d>] device_driver_attach+0xc0/0xd0
> > > >>> [<00000000c835f0df>] __driver_attach+0x84/0x124
> > > >>> [<000000008e5a429c>] bus_for_each_dev+0x70/0xc0
> > > >>> [<000000001735e8a8>] driver_attach+0x24/0x30
> > > >>> [<000000006d94b04f>] bus_add_driver+0x104/0x1ec
> > > >>>
> > > >>> This is not a memory leak because we pass the share memory pointer to
> > > >>> secure world and would get it from secure world before releasing it.
> > > >>
> > > >>> How about if it's actually a memory leak caused by the secure world?
> > > >>> An example being secure world just allocates kernel memory via OPTEE_SMC_RPC_FUNC_ALLOC and doesn't free it via OPTEE_SMC_RPC_FUNC_FREE.
> > > >>
> > > >>> IMO, we need to cross-check optee-os if it's responsible for leaking kernel memory.
> > > >>
> > > >> Hi sumit,
> > > >>
> > > >> You mean we need to check whether there is a real memleak,
> > > >> If being secure world just allocate kernel memory via OPTEE_SMC_PRC_FUNC_ALLOC and until the end, there is no free
> > > >> It via OPTEE_SMC_PRC_FUNC_FREE, then we should judge it as a memory leak, wo need to judge whether it is caused by secure os?
> > > >
> > > > Yes. AFAICT, optee-os should allocate shared memory to communicate
> > > > with tee-supplicant. So once the communication is done, the underlying
> > > > shared memory should be freed. I can't think of any scenario where
> > > > optee-os should keep hold-off shared memory indefinitely.
> > >
> > > I believe it can happen when OP-TEE's CFG_PREALLOC_RPC_CACHE is y. See
> > > the config file [1] and the commit which introduced this config [2].
> > >
> > > [1] https://github.com/OP-TEE/optee_os/blob/3.15.0/mk/config.mk#L709
> > > [2] https://github.com/OP-TEE/optee_os/commit/8887663248ad
> > >
> >
> > It's been a while since OP-TEE caches some shm buffers to prevent
> > re-allocting them on and on.
> > OP-TEE does so for 1 shm buffer per "tee threads" OP-TEE has provisioned.
> > Each thread can cache a shm reference.
> > Note that used RPCs from optee to linux/u-boot/ree do not require such
> > message buffer (IMO).
> >
> > The main issue is the shm buffer are allocated per optee thread
> > (thread context assigned to client invocation request when entreing
> > optee).
> > Therefore, if an optee thread caches a shm buffer, it makes the caller
> > tee session to have a shm reference with a refcount held, until Optee
> > thread releases its cached shm reference.
> >
> > There are ugly side effects. Linux must disable the cache to release
> > all resources.
> > We recently saw some tee sessions may be left open because of such shm
> > refcount held.
> > It can lead to few misbehaviour of the TA service (restarting a
> > service, releasing a resource)
> >
> > Config switch CFG_PREALLOC_RPC_CACHE was introduced [pr4896] to
> > disable the feature at boot time.
> > There are means to not use it, or to explicitly enable/disable it at
> > run time (already used optee smc services for that). Would maybe be a
> > better default config.
> > Note this discussion thread ending at his comment [issue1918]:
> >
>
> Thanks etienne for the detailed description and references. Although,
> we can set CFG_PREALLOC_RPC_CACHE=n by default but it feels like we
> would miss a valuable optimization.
>
> How about we just allocate a shared memory page during the OP-TEE
> driver probe and share it with optee-os to use for RPC arguments? And
> later it can be freed during OP-TEE driver removal. This would avoid
> any refconting of this special memory to be associated with TA
> sessions.

True. The driver currently invokes OPTEE_SMC_ENABLE_SHM_CACHE
to start caching some shm allocations. The optee_os part of that command
could be changed to preallocate the required small rpc message buffer per
provisioned tee thread.

Existing OPTEE_SMC_DISABLE_SHM_CACHE should behave accordingly.

etienne

>
> -Sumit
>
> > Comments are welcome. I may have missed something in the description
> > (or understanding :).
> >
> > [pr4896] https://github.com/OP-TEE/optee_os/pull/4896
> > [issue1918] https://github.com/OP-TEE/optee_os/issues/1918#issuecomment-968747738
> >
> > Best regards,
> > etienne
> >
> >
> >
> > > --
> > > Jerome