Re: RFC: adding Linux vsyscall-disable and similar backwards-incompatibility flags to ELF headers?
From: Andy Lutomirski
Date: Wed Sep 02 2015 - 10:08:33 EST
On Sep 2, 2015 6:57 AM, "Brian Gerst" <brgerst@xxxxxxxxx> wrote:
> On Tue, Sep 1, 2015 at 10:21 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > On Sep 1, 2015 6:53 PM, "Brian Gerst" <brgerst@xxxxxxxxx> wrote:
> >> On Tue, Sep 1, 2015 at 8:51 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> >> > Hi all-
> >> >
> >> > Linux has a handful of weird features that are only supported for
> >> > backwards compatibility. The big one is the x86_64 vsyscall page, but
> >> > uselib probably belongs on the list, too, and we might end up with
> >> > more at some point.
> >> >
> >> > I'd like to add a way that new programs can turn these features off.
> >> > In particular, I want the vsyscall page to be completely gone from the
> >> > perspective of any new enough program. This is straightforward if we
> >> > add a system call to ask for the vsyscall page to be disabled, but I'm
> >> > wondering if we can come up with a non-syscall way to do it.
> >> >
> >> > I think that the ideal behavior would be that anything linked against
> >> > a sufficiently new libc would be detected, but I don't see a good way
> >> > to do that using existing toolchain features.
> >> >
> >> > Ideas? We could add a new phdr for this, but then we'd need to play
> >> > linker script games, and I'm not sure that could be done in a clean,
> >> > extensible way.
> >> The vsyscall page is mapped in the fixmap region, which is shared
> >> between all processes. You can't turn it off for an individual
> >> process.
> > Why not?
> > We already emulate all attempts to execute it, and that's trivial to
> > turn of per process. Project Zero pointed out that read access is a
> > problem, too, but we can flip the U/S bit in the pgd once we evict
> > pvclock from the fixmap.
> > And we definitely need to evict pvclock from the fixmap regardless.
> Sure, you can turn off emulation per-process. But the page mapping
> will be the same for every process because it is in the kernel part of
> the page tables which is shared by all processes.
True, but I don't think that means that the mapping has to be readable
in all processes. Once it's the only user-readable mapping in the top
512 GB, we can turn off user access to the whole top 512 GB.
The only other user accessible thing in the top 512GB (and the only
other user accessible thing in a kernel address at all) is the KVM
pvclock mapping. We should turn that off, too, because it's
exploitable in more or less the same way as the vsyscall page.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/