Re: [PATCH] virt: Prevent AES-GCM IV reuse in SNP guest driver

From: Peter Gonda
Date: Thu Oct 20 2022 - 10:47:10 EST


On Thu, Oct 20, 2022 at 8:02 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
>
> On 10/19/22 16:47, Peter Gonda wrote:
> > On Wed, Oct 19, 2022 at 2:58 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
> >> On 10/19/22 15:39, Peter Gonda wrote:
> >>> On Wed, Oct 19, 2022 at 1:56 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
> >>>> On 10/19/22 14:17, Dionna Amalie Glaze wrote:
> >>>>> On Wed, Oct 19, 2022 at 11:44 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
> >>>>>> On 10/19/22 12:40, Peter Gonda wrote:
> >>>>>>> On Wed, Oct 19, 2022 at 11:03 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
> >>>>>>>> On 10/19/22 10:03, Peter Gonda wrote:
> >>>>>>>>> The ASP and an SNP guest use a series of AES-GCM keys called VMPCKs to
> >>>>>>>>> communicate securely with each other. The IV to this scheme is a
> >>>>>>>>> sequence number that both the ASP and the guest track. Currently this
> >>>>>>>>> sequence number in a guest request must exactly match the sequence
> >>>>>>>>> number tracked by the ASP. This means that if the guest sees an error
> >>>>>>>>> from the host during a request it can only retry that exact request or
> >>>>>>>>> disable the VMPCK to prevent an IV reuse. AES-GCM cannot tolerate IV
> >>>>>>>>> reuse see:
> >>>>>>>>> https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents/bcm/comments/800-38-series-drafts/gcm/joux_comments.pdf
>
> >>> OK so the guest retires with the same request when it gets an
> >>> SNP_GUEST_REQ_INVALID_LEN error. It expands its internal buffer to
> >>
> >> It would just use the pre-allocated snp_dev->certs_data buffer with npages
> >> set to the full size of that buffer.
> >
> > Actually we allocate that buffer with size SEV_FW_BLOB_MAX_SIZE. Maybe
> > we want to just allocate this buffer which we think is sufficient and
> > never increase the allocation?
> >
> > I see the size of
> > https://developer.amd.com/wp-content/resources/ask_ark_milan.cert is
> > 3200 bytes. Assuming the VCEK cert is the same size (which it should
> > be since this .cert is 2 certificates). 16K seems to leave enough room
> > even for some vendor certificates?
>
> I think just using the 16K buffer (4 pages) as it is allocated today is
> ok. If we get a SNP_GUEST_REQ_INVALID_LEN error that is larger than 4
> pages, then we won't ever be able to pull the certs given how the driver
> is coded today. In that case, disabling the VMPCK is in order.
>
> A separate patch could be submitted later to improve this overall aspect
> of the certs buffer if needed.

If that sounds OK I'd prefer that. This keeps the drivers current limit:

static int get_ext_report(struct snp_guest_dev *snp_dev, struct
snp_guest_request_ioctl *arg)
...
if (req.certs_len > SEV_FW_BLOB_MAX_SIZE ||
!IS_ALIGNED(req.certs_len, PAGE_SIZE))
return -EINVAL;

I'd prefer not to add extra features during the bug fix. But happy to
make this work with buffers greater than SEV_FW_BLOB_MAX_SIZE as
follow up if you want.

>
> Thanks,
> Tom
>
> >
> >>
> >>> hold the certificates. When it finally gets a successful request w/
> >>> certs. Do we want to return the attestation bits to userspace, but
> >>> leave out the certificate data. Or just error out the ioctl
> >>> completely?
> >>
> >> We need to be able to return the attestation bits that came back with the
> >> extra certs. So just error out of the ioctl with the length error and let
> >> user-space retry with the recommended number of pages.
> >
> > That sounded simpler to me. Will do.
> >
> >>
> >>>
> >>> I can do that in this series.
> >>
> >> Thanks!
> >>
> >>>
> >>>>
> >>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> For the rate-limiting patch series [1], the rate-limiting will have to be
> >>>>>>>> performed within the kernel, while the mutex is held, and then retry the
> >>>>>>>> exact request again. Otherwise, that error will require disabling the
> >>>>>>>> VMPCK. Either that, or the hypervisor must provide the rate limiting.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>> [1] https://lore.kernel.org/lkml/20221013160040.2858732-1-dionnaglaze@xxxxxxxxxx/
> >>>>>>>
> >>>>>>> Yes I think if the host rate limits the guest. The guest kernel should
> >>>>>>> retry the exact message. Which mutex are you referring too?
> >>>>>>
> >>>>>> Or the host waits and then submits the request and the guest kernel
> >>>>>> doesn't have to do anything. The mutex I'm referring to is the
> >>>>>> snp_cmd_mutex that is taken in snp_guest_ioctl().
> >>>>>
> >>>>> I think that either the host kernel or guest kernel waiting can lead
> >>>>> to unacceptable delays.
> >>>>> I would recommend that we add a zero argument ioctl to /dev/sev-guest
> >>>>> specifically for retrying the last request.
> >>>>>
> >>>>> We can know what the last request is due to the sev_cmd_mutex serialization.
> >>>>> The driver will just keep a scratch buffer for this. Any other request
> >>>>> that comes in without resolving the retry will get an -EBUSY error
> >>>>> code.
> >>>>
> >>>> And the first caller will have received an -EAGAIN in order to
> >>>> differentiate between the two situations?
> >>>>
> >>>>>
> >>>>> Calling the retry ioctl without a pending command will result in -EINVAL.
> >>>>>
> >>>>> Let me know what you think.
> >>>>
> >>>> I think that sounds reasonable, but there are some catches. You will need
> >>>> to ensure that the caller that is supposed to retry does actually retry
> >>>> and that a caller that does retry is the same caller that was told to retry.
> >>>
> >>> Whats the issue with the guest driver taking some time?
> >>>
> >>> This sounds complex because there may be many users of the driver. How
> >>> do multiple users coordinate when they need to use the retry ioctl?
> >>>
> >>>>
> >>>> Thanks,
> >>>> Tom
> >>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Tom
> >>>>>
> >>>>>
> >>>>>