Re: [RFC PATCH] x86/sev: Disallow userspace access to BIOS region for SEV-SNP guests
From: Kees Cook
Date: Thu Apr 10 2025 - 12:36:26 EST
On Thu, Apr 10, 2025 at 03:03:55PM +0300, Nikolay Borisov wrote:
>
>
> On 9.04.25 г. 21:39 ч., Dan Williams wrote:
> > Kees Cook wrote:
> > > On Tue, Apr 08, 2025 at 04:55:08PM -0700, Dan Williams wrote:
> > > > Dave Hansen wrote:
> > > > > On 4/8/25 06:43, Tom Lendacky wrote:
> > > > > > > Tom/Boris, do you see a problem blocking access to /dev/mem for SEV
> > > > > > > guests?
> > > > > > Not sure why we would suddenly not allow that.
> > > > >
> > > > > Both TDX and SEV-SNP have issues with allowing access to /dev/mem.
> > > > > Disallowing access to the individually troublesome regions can fix
> > > > > _part_ of the problem. But suddenly blocking access is guaranteed to fix
> > > > > *ALL* the problems forever.
> > > >
> > > > ...or at least solicits practical use cases for why the kernel needs to
> > > > poke holes in the policy.
> > > >
> > > > > Or, maybe we just start returning 0's for all reads and throw away all
> > > > > writes. That is probably less likely to break userspace that doesn't
> > > > > know what it's doing in the first place.
> > > >
> > > > Yes, and a bulk of the regression risk has already been pipe-cleaned by
> > > > KERNEL_LOCKDOWN that shuts down /dev/mem and PCI resource file mmap in
> > > > many scenarios.
> > > >
> > > > Here is an updated patch that includes some consideration for mapping
> > > > zeros for known legacy compatibility use cases.
> > [..]
> > > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> > > > index bfa444a7dbb0..c8679ae1bc8b 100644
> > > > --- a/arch/x86/mm/init.c
> > > > +++ b/arch/x86/mm/init.c
> > > > @@ -867,6 +867,8 @@ void __init poking_init(void)
> > > > */
> > > > int devmem_is_allowed(unsigned long pagenr)
> > > > {
> > > > + bool platform_allowed = x86_platform.devmem_is_allowed(pagenr);
> > > > +
> > > > if (region_intersects(PFN_PHYS(pagenr), PAGE_SIZE,
> > > > IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
> > > > != REGION_DISJOINT) {
> > > > @@ -885,14 +887,20 @@ int devmem_is_allowed(unsigned long pagenr)
> > > > * restricted resource under CONFIG_STRICT_DEVMEM.
> > > > */
> > > > if (iomem_is_exclusive(pagenr << PAGE_SHIFT)) {
> > > > - /* Low 1MB bypasses iomem restrictions. */
> > > > - if (pagenr < 256)
> > > > + /*
> > > > + * Low 1MB bypasses iomem restrictions unless the
> > > > + * platform says "no", in which case map zeroes
> > > > + */
> > > > + if (pagenr < 256) {
> > > > + if (!platform_allowed)
> > > > + return 2;
> > > > return 1;
> > > > + }
> > > > return 0;
> > > > }
> > > > - return 1;
> > > > + return platform_allowed;
> > > > }
> > > > void free_init_pages(const char *what, unsigned long begin, unsigned long end)
> > >
> > > I am reminded of this discussion:
> > > https://lore.kernel.org/all/CAPcyv4iVt=peUAk1qx_EfKn7aGJM=XwRUpJftBhkUgQEti2bJA@xxxxxxxxxxxxxx/
> > >
> > > As in, mmap will bypass this restriction, so if you really want the low
> > > 1MiB to be unreadable, a solution for mmap is still needed...
> >
> > Glad you remembered that!
> >
> > This needs a self-test to verify the assumptions here. I can circle back
> > next week or so take a look at turning this into a bigger series. If
> > someone has cycles to take this on before that I would not say no to
> > some help.
>
> Can't we simply treat return value of 2 for range_is_allowed the same way as
> if 0 was returned in mmap_mem and simply fail the call with -EPERM?
The historical concern was that EPERM would break old tools. I don't
have any current evidence either way, though.
--
Kees Cook