Re: [PATCH v4 3/3] KVM: arm64: Release the ownership of the hyp rx buffer to Trustzone
From: Quentin Perret
Date: Fri Mar 28 2025 - 07:40:10 EST
On Thursday 27 Mar 2025 at 09:37:31 (+0000), Sebastian Ene wrote:
> On Wed, Mar 26, 2025 at 04:48:33PM +0000, Quentin Perret wrote:
> > On Wednesday 26 Mar 2025 at 11:39:01 (+0000), Sebastian Ene wrote:
> > > Introduce the release FF-A call to notify Trustzone that the hypervisor
> > > has finished copying the data from the buffer shared with Trustzone to
> > > the non-secure partition.
> > >
> > > Reported-by: Andrei Homescu <ahomescu@xxxxxxxxxx>
> > > Signed-off-by: Sebastian Ene <sebastianene@xxxxxxxxxx>
> > > ---
> > > arch/arm64/kvm/hyp/nvhe/ffa.c | 9 ++++++---
> > > 1 file changed, 6 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/ffa.c b/arch/arm64/kvm/hyp/nvhe/ffa.c
> > > index 6df6131f1107..ac898ea6274a 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/ffa.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/ffa.c
> > > @@ -749,6 +749,7 @@ static void do_ffa_part_get(struct arm_smccc_res *res,
> > > DECLARE_REG(u32, uuid3, ctxt, 4);
> > > DECLARE_REG(u32, flags, ctxt, 5);
> > > u32 count, partition_sz, copy_sz;
> > > + struct arm_smccc_res _res;
> > >
> > > hyp_spin_lock(&host_buffers.lock);
> > > if (!host_buffers.rx) {
> > > @@ -765,11 +766,11 @@ static void do_ffa_part_get(struct arm_smccc_res *res,
> > >
> > > count = res->a2;
> > > if (!count)
> > > - goto out_unlock;
> > > + goto release_rx;
> > >
> > > if (hyp_ffa_version > FFA_VERSION_1_0) {
> > > /* Get the number of partitions deployed in the system */
> > > - if (flags & 0x1)
> > > + if (flags & PARTITION_INFO_GET_RETURN_COUNT_ONLY)
> > > goto out_unlock;
> > >
> > > partition_sz = res->a3;
> > > @@ -781,10 +782,12 @@ static void do_ffa_part_get(struct arm_smccc_res *res,
> > > copy_sz = partition_sz * count;
> > > if (copy_sz > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE) {
> > > ffa_to_smccc_res(res, FFA_RET_ABORTED);
> > > - goto out_unlock;
> > > + goto release_rx;
> > > }
> > >
> > > memcpy(host_buffers.rx, hyp_buffers.rx, copy_sz);
> > > +release_rx:
> > > + ffa_rx_release(&_res);
>
> Hi,
>
> >
> > I'm a bit confused about this release call here. In the pKVM FF-A proxy
> > model, the hypervisor is essentially 'transparent', so do we not expect
> > EL1 to issue that instead?
>
> I think the EL1 should also issue this call irrespective of what the
> hypervisor is doing. Sudeep can correct me here if I am wrong, but this
> is my take on this.
Agreed, but with the code as it is implemented in this patch, I think
that from the host perspective there is a difference in semantic for
the release call. W/o pKVM the buffer is essentially 'locked' until
the host issues the release call. With pKVM, the buffer is effectively
unlocked immediately upon return from the PARTITION_INFO_GET call
because the hypervisor happened to have issued the release call
behind our back. And there is no way the host to know the difference.
I understand that we can argue the hypervisor-issued call is for the
EL2-TZ buffers while the EL1-issued call is for the EL1-EL2 buffers,
but that's not quite working that way since pKVM just blindly forwards
the release calls coming from EL1 w/o implementing the expected
semantic.
> I am looking at this as a way of signaling the availability of the rx
> buffer across partitions. There are some calls that when invoked, they
> place the buffer in a 'locked state'.
>
>
> > How is EL1 supposed to know that the
> > hypervisor has already sent the release call?
>
> It doesn't need to know, it issues the call as there is no hypervisor
> in-between, why would it need to know ?
As per the comment above, there is a host-visible difference in semantic
with or without pKVM which IMO is problematic.
For example, if the host issues two PARTITION_INFO_GET calls back to
back w/o a release call in between, IIUC the expectation from the
FF-A spec is for the second one to fail. With this patch applied, the
second call would succeed thanks to the implicit release-call issued by
pKVM. But it would fail as it is supposed to do w/o pKVM.
I'm not entirely sure if that's gonna cause real-world problem, but it
does feel unecessary at best. Are we trying to fix an EL1 bug in the
hypervisor here?
> > And isn't EL1 going to be
> > confused if the content of the buffer is overridden before is has issued
> > the release call itself?
>
> The hypervisor should prevent changes to the buffer mapped between the
> host and itself until the release_rx call is issued from the host.
> If another call that wants to make use of the rx buffer sneaks in, we
> would have to revoke it with BUSY until rx_release is sent.
Right, exactly, but that's not implemented at the moment. IMO it is much
simpler to rely on the host to issue the release call and just not do it
from the PARTITION_INFO_GET path in pKVM. And if we're scared about a
release call racing with PARTITION_INFO_GET at pKVM level, all we should
need to do is forward the release call with the host_buffers.lock held I
think. Wdyt?
Thanks,
Quentin