Re: [PATCH] selftests/x86: add "ffff8" -- kernel memory scanner

From: Alexey Dobriyan
Date: Sat Oct 29 2022 - 05:55:40 EST


On Fri, Oct 28, 2022 at 03:14:31PM -0700, H. Peter Anvin wrote:
> On October 28, 2022 12:33:49 PM PDT, Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
> >During Meltdown drama Microsoft managed to screw up pagetables and give
> >full kernel memory access to userspace:
> >
> > https://blog.frizk.net/2018/03/total-meltdown.html
> >
> >We don't want _any_ of that.
> >
> >This utility named ffff8 tries to read upper half of virtual address space
> >and report access that went through (excluding vsyscall page if present).
> >
> >It works by doing access and rewriting RDI in the SIGSEGV handler.
> >
> >I've tested it with kernel patch which installs rogue page and it was found.
> >
> > $ ./a.out -h
> > usage: ./a.out [-f] [-r] [-n N] [-s S]
> > -f: sequential scan
> > -r: random scan (default)
> > -n: use N threads (default: $(nproc))
> > -s: lowest address shift (default: 47)
> > -t: time to run (default: 256 seconds)
> >
> >Intended usages are:
> >
> > $ ./a.out -f # full scan on all cores
> >or
> > $ ./a.out -r -t ... # time limited random scan for QA test
> >
> >Features include:
> >* multithreading
> >* auto spreads over CPUs given by taskset
> >* full sequential scan / random scan
> >* auto split work for full scan
> >* smaller than 47-bit scanning (for benchmarking)
> >* time limit
> >
> >Note 1:
> >HT appears to make scanning slower. If this is the case use taskset(1)
> >to exclude HT siblings.
> >
> >Note 2:
> >Full 47-bit window scan takes a long time. My 16c/32t potato can do it
> >in ~8 hours. Benchmark with smaller shifts first.
>
> Good initiative!

Thanks!

> Only complaint I have is the name and the limit to LA48. LA57 (5-level
> page tables) have the same potential issue.

Yes. It would take only half a year to scan 57-bit space if my system
had one. :-)

> You may want to consider doing a breadth-first sweep scanning
> by decreasing powers of 2 as that will more quickly catch errors caused
> by problems in the upper levels of the page table hierarchy.

It can scan from top to bottom so that fixmap space is easily covered.