Re: [PATCH v9 00/13] firmware: qcom: qseecom: convert to using the TZ allocator

From: Maximilian Luz
Date: Thu Mar 28 2024 - 12:52:24 EST


On 3/25/24 11:03 AM, Bartosz Golaszewski wrote:
From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>

SCM calls that take memory buffers as arguments require that they be
page-aligned, physically continuous and non-cachable. The same
requirements apply to the buffer used to pass additional arguments to SCM
calls that take more than 4.

To that end drivers typically use dma_alloc_coherent() to allocate memory
of suitable format which is slow and inefficient space-wise.

SHM Bridge is a safety mechanism that - once enabled - will only allow
passing buffers to the TrustZone that have been explicitly marked as
shared. It improves the overall system safety with SCM calls and is
required by the upcoming scminvoke functionality.

The end goal of this series is to enable SHM bridge support for those
architectures that support it but to that end we first need to unify the
way memory for SCM calls is allocated. This in itself is beneficial as
the current approach of using dma_alloc_coherent() in most places is quite
slow.

First let's add a new TZ Memory allocator that allows users to create
dynamic memory pools of format suitable for sharing with the TrustZone.
Make it ready for implementing multiple build-time modes.

Convert all relevant drivers to using it. Add separate pools for SCM core
and for qseecom.

Finally add support for SHM bridge and make it the default mode of
operation with the generic allocator as fallback for the platforms that
don't support SHM bridge.

Tested on db410c, RB5, sm8550-qrd. Previous iteration tested also on
sa8775p-ride and lenovo X13s (please do retest on those platforms if you
can).

Not sure in which version things changed (I haven't really kept up with
that, sorry), but this version (with the generic allocator selected in
the config) fails reading EFI vars on my Surface Pro X (sc8180x):

[ 2.381020] BUG: scheduling while atomic: mount/258/0x00000002
[ 2.383356] Modules linked in:
[ 2.385669] Preemption disabled at:
[ 2.385672] [<ffff800080f7868c>] qcom_tzmem_alloc+0x7c/0x224
[ 2.390325] CPU: 1 PID: 258 Comm: mount Not tainted 6.8.0-1-surface-dev #2
[ 2.392632] Hardware name: Microsoft Corporation Surface Pro X/Surface Pro X, BIOS 7.620.140 08/11/2023
[ 2.394955] Call trace:
[ 2.397269] dump_backtrace+0x94/0x114
[ 2.399583] show_stack+0x18/0x24
[ 2.401883] dump_stack_lvl+0x48/0x60
[ 2.404181] dump_stack+0x18/0x24
[ 2.406476] __schedule_bug+0x84/0xa0
[ 2.408770] __schedule+0x6f4/0x7fc
[ 2.411051] schedule+0x30/0xf0
[ 2.413323] synchronize_rcu_expedited+0x158/0x1ec
[ 2.415594] lru_cache_disable+0x28/0x74
[ 2.417853] __alloc_contig_migrate_range+0x68/0x210
[ 2.420122] alloc_contig_range+0x140/0x280
[ 2.422384] cma_alloc+0x128/0x404
[ 2.424643] cma_alloc_aligned+0x44/0x6c
[ 2.426881] dma_alloc_contiguous+0x30/0x44
[ 2.429111] __dma_direct_alloc_pages.isra.0+0x60/0x20c
[ 2.431343] dma_direct_alloc+0x6c/0x2ec
[ 2.433569] dma_alloc_attrs+0x80/0xf4
[ 2.435786] qcom_tzmem_pool_add_memory+0x8c/0x178
[ 2.438008] qcom_tzmem_alloc+0xe8/0x224
[ 2.440232] qsee_uefi_get_next_variable+0x78/0x2cc
[ 2.442443] qcuefi_get_next_variable+0x50/0x94
[ 2.444643] efivar_get_next_variable+0x20/0x2c
[ 2.446832] efivar_init+0x8c/0x29c
[ 2.449009] efivarfs_fill_super+0xd4/0xec
[ 2.451182] get_tree_single+0x74/0xbc
[ 2.453349] efivarfs_get_tree+0x18/0x24
[ 2.455513] vfs_get_tree+0x28/0xe8
[ 2.457680] vfs_cmd_create+0x5c/0xf4
[ 2.459840] __do_sys_fsconfig+0x458/0x598
[ 2.461993] __arm64_sys_fsconfig+0x24/0x30
[ 2.464143] invoke_syscall+0x48/0x110
[ 2.466281] el0_svc_common.constprop.0+0x40/0xe0
[ 2.468415] do_el0_svc+0x1c/0x28
[ 2.470546] el0_svc+0x34/0xb4
[ 2.472669] el0t_64_sync_handler+0x120/0x12c
[ 2.474793] el0t_64_sync+0x190/0x194

and subsequently

[ 2.477613] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[ 2.477618] WARNING: CPU: 4 PID: 258 at kernel/sched/core.c:5889 preempt_count_sub+0x90/0xd4
[ 2.478682] Modules linked in:
[ 2.479214] CPU: 4 PID: 258 Comm: mount Tainted: G W 6.8.0-1-surface-dev #2
[ 2.479752] Hardware name: Microsoft Corporation Surface Pro X/Surface Pro X, BIOS 7.620.140 08/11/2023
[ 2.480296] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 2.480839] pc : preempt_count_sub+0x90/0xd4
[ 2.481380] lr : preempt_count_sub+0x90/0xd4
[ 2.481917] sp : ffff8000857cbb00
[ 2.482450] x29: ffff8000857cbb00 x28: ffff8000806b759c x27: 8000000000000005
[ 2.482988] x26: 0000000000000000 x25: ffff0000802cbaa0 x24: ffff0000809228b0
[ 2.483525] x23: 0000000000000000 x22: ffff800082b462f0 x21: 0000000000001000
[ 2.484062] x20: ffff80008363d000 x19: ffff000080922880 x18: fffffffffffc9660
[ 2.484600] x17: 0000000000000020 x16: 0000000000000000 x15: 0000000000000038
[ 2.485137] x14: 0000000000000000 x13: ffff800082649258 x12: 00000000000006db
[ 2.485674] x11: 0000000000000249 x10: ffff8000826fc930 x9 : ffff800082649258
[ 2.486207] x8 : 00000000ffffdfff x7 : ffff8000826f9258 x6 : 0000000000000249
[ 2.486738] x5 : 0000000000000000 x4 : 40000000ffffe249 x3 : 0000000000000000
[ 2.487267] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000841fa300
[ 2.487792] Call trace:
[ 2.488311] preempt_count_sub+0x90/0xd4
[ 2.488831] _raw_spin_unlock_irqrestore+0x1c/0x44
[ 2.489352] qcom_tzmem_alloc+0x1cc/0x224
[ 2.489873] qsee_uefi_get_next_variable+0x78/0x2cc
[ 2.490390] qcuefi_get_next_variable+0x50/0x94
[ 2.490907] efivar_get_next_variable+0x20/0x2c
[ 2.491420] efivar_init+0x8c/0x29c
[ 2.491931] efivarfs_fill_super+0xd4/0xec
[ 2.492440] get_tree_single+0x74/0xbc
[ 2.492948] efivarfs_get_tree+0x18/0x24
[ 2.493453] vfs_get_tree+0x28/0xe8
[ 2.493957] vfs_cmd_create+0x5c/0xf4
[ 2.494459] __do_sys_fsconfig+0x458/0x598
[ 2.494963] __arm64_sys_fsconfig+0x24/0x30
[ 2.495468] invoke_syscall+0x48/0x110
[ 2.495972] el0_svc_common.constprop.0+0x40/0xe0
[ 2.496475] do_el0_svc+0x1c/0x28
[ 2.496976] el0_svc+0x34/0xb4
[ 2.497475] el0t_64_sync_handler+0x120/0x12c
[ 2.497975] el0t_64_sync+0x190/0x194
[ 2.498466] ---[ end trace 0000000000000000 ]---
[ 2.507347] qcom_scm firmware:scm: qseecom: scm call failed with error -22
[ 2.507813] efivars: get_next_variable: status=8000000000000007

If I understand correctly, it enters an atomic section in
qcom_tzmem_alloc() and then tries to schedule somewhere down the line.
So this shouldn't be qseecom specific.

I should probably also say that I'm currently testing this on a patched
v6.8 kernel, so there's a chance that it's my fault. However, as far as
I understand, it enters an atomic section in qcom_tzmem_alloc() and then
later tries to expand the pool memory with dma_alloc_coherent(). Which
AFAIK is allowed to sleep with GFP_KERNEL (and I guess that that's the
issue here).

I've also tried the shmem allocator option, but that seems to get stuck
quite early at boot, before I even have usb-serial access to get any
logs. If I can find some more time, I'll try to see if I can get some
useful output for that.

Best regards,
Max