RE: [PATCH v9 00/17] Enable FSGSBASE instructions

From: Metzger, Markus T
Date: Mon Dec 02 2019 - 03:23:57 EST


> On Fri, Nov 29, 2019 at 6:56 AM Metzger, Markus T
> <markus.t.metzger@xxxxxxxxx> wrote:
> >
> > > On Fri, Nov 15, 2019 at 07:29:17PM +0100, Thomas Gleixner wrote:
> > > > On Fri, 4 Oct 2019, Chang S. Bae wrote:
> > > > >
> > > > > Updates from v8 [10]:
> > > > > * Internalized the interrupt check in the helper functions (Andy L.)
> > > > > * Simplified GS base helper functions (Tony L.)
> > > > > * Changed the patch order to put the paranoid path changes before the
> > > > > context switch changes (Tony L.)
> > > > > * Fixed typos (Randy D.) and massaged a few sentences in the
> documentation
> > > > > * Massaged the FSGSBASE enablement message
> > > >
> > > > That still lacks what Andy requested quite some time ago in the V8 thread:
> > > >
> > > > https://lore.kernel.org/lkml/034aaf3a-a93d-ec03-0bbd-
> > > 068e1905b774@xxxxxxxxxx/
> > > >
> > > > "I also think that, before this series can have my ack, it needs an
> > > > actual gdb maintainer to chime in, publicly, and state that they have
> > > > thought about and tested the ABI changes and that gdb still works on
> > > > patched kernels with and without FSGSBASE enabled. I realize that there
> > > > were all kinds of discussions, but they were all quite theoretical, and
> > > > I think that the actual patches need to be considered by people who
> > > > understand the concerns. Specific test cases would be nice, too."
> > > >
> > > > What's the state of this?
> >
> > On branch users/mmetzger/fsgs in sourceware.org/git/binutils-gdb.git,
> > there's a GDB test covering the behavior discussed theoretically back then.
> >
> > It covers modifying the selector as well as the base from GDB and using
> > the modified values for inferior calls as well as for resuming the inferior.
> >
> > Current kernels allow changing the selector and provide the resulting
> > base back to the ptracer. They also allow changing the base as long as
> > the selector is zero. That's the behavior we wanted to preserve IIRC.
>
> The general kernel rule is that we don't break working applications.
> Other than that, we're allowed to change the ABI if existing working
> applications don't break. I can't tell whether you wrote a test that
> detects a behavior change or whether you wrote a test that tests
> behavior that gdb or other programs actually rely on.

Well, that's a tough question. The test covers GDB's behavior on today's
systems. GDB itself does not actually rely on that behavior. That is, GDB
itself wouldn't break. You couldn't do all that you could do with it before,
though.

It would be GDB's users that are affected. How do you tell if anyone is
actually relying on it?


> Certainly, with a 32-bit *gdb*, writing a nonzero value to FS or GS
> using ptrace should change the base accordingly. I think the current
> patches get this wrong.
>
> With a 64-bit gdb and a 32-bit inferior, in an ideal world, everything
> would work just like full 64-bit, since that's how the hardware works.

Not sure what you mean. The h/w runs in compatibility mode and the
inferior cannot set the base directly, can it?


> But we don't necessary live in an ideal world.
>
> With a 64-bit gdb and a 64-bit inferior, the inferior can set FS to
> some nonzero value and then set FSBASE to an arbitrary 64-bit number,
> and FS will retain its value. ptrace needs to give gdb some way to
> read, save, and restore this state.

With Chang's patch series, that actually works. You can set FS and then
set FSBASE without setting FS to zero previously. The tests do not cover
that since on current system that leads to the inferior crashing in read_fs().


> I think the ideal behavior is that 64-bit ptrace callers should
> control FS and FSBASE independently. The question is: will that break
> things? If it will, then we'll need to make sure that there is an API
> by which a debugger can independently control FS and FSBASE, and we'll
> also need to make sure that whatever existing API debuggers use to
> change FS and expect FSBASE to magically change as well continue to
> have that effect.

We had discussed this some time ago and proposed the following behavior: "
https://lore.kernel.org/lkml/1521481767-22113-15-git-send-email-chang.seok.bae@xxxxxxxxx/

In a summary, ptracer's update on FS/GS selector and base
yields such results on tracee's base:
- When FS/GS selector only changed (to nonzero), fetch base
from GDT/LDT (legacy behavior)
- When FS/GS base (regardless of selector) changed, tracee
will have the base
"

The ptracer would need to read registers back after changing the selector
to get the updated base.

The only time when both change at the same time, then, is when registers
are restored after returning from an inferior call. And then, it's the base
we want to take priority since we previously ensured that the base is always
up-to-date.


> > The patch series on branch fsgs_tip_5.4-rc1_100319 at
> > github.com/changbae/Linux-kernel.git breaks tests that modify the
> > selector and expect that to change the base.
> >
> > That kernel allows changing the base via ptrace but ignores changes
> > to the selector.
> >
>
> I don't really understand your test, but I'm pretty sure I found a
> couple bugs in the test:

Thanks for your review.


> 88 int
> 89 switch_fs_read (unsigned int fs)
> 90 {
> 91 __asm__ volatile ("mov %0, %%fs" :: "rm"(fs) : "memory");
> 92
> 93 return read_fs ();
> 94 }
>
> This has fundamentally inconsistent behavior on Intel vs AMD CPUs.
> Intel CPUs will clear FSBASE when you write 0 to FS. Older AMD CPUs
> do *not* clear FSBASE when you write 0 to FS. Very very new AMD CPUs
> behave more like Intel CPUs, I believe.

Thanks for pointing this out but I don't think that this is actually an issue for
this test. This function is only ever used with fs==0xa7 to switch to the LDT
entry that the test program has setup before.

The test sets FS/GS to zero via ptrace from GDB.


> 40 struct user_desc ud;
> 41 int errcode;
> 42
> 43 memset (&ud, 0, sizeof (ud));
> 44 ud.entry_number = entry;
> 45 ud.base_addr = (unsigned long) base;
> 46 ud.limit = (unsigned int) size;
> 47
> 48 /* Some 64-bit systems declare ud.base_addr 'unsigned int' instead of
> 49 'unsigned long'.
> 50
> 51 Combined with address space layout randomization, this might
> 52 truncate our base address and result in a crash when we try to read
> 53 segment-relative.
> 54
> 55 Checking the field size would exclude too many systems so we settle
> 56 for checking whether we actually truncated the address. */
> 57
> 58 if (ud.base_addr != (unsigned long) base)
> 59 return 0u;
>
> The base of a segment in a descriptor table is 32 bits, full stop.
> This is a hardware limitation and has nothing to do with the kernel.
> base_addr is correctly unsigned int in the kernel headers. If you
> actually find a system where base_addr is unsigned long and unsigned
> long is 64 bits, then your test will malfunction.

The modify_ldt(2) man page says: "
The user_desc structure is defined in <asm/ldt.h> as:

struct user_desc {
unsigned int entry_number;
unsigned long base_addr;
unsigned int limit;
unsigned int seg_32bit:1;
unsigned int contents:2;
unsigned int read_exec_only:1;
unsigned int limit_in_pages:1;
unsigned int seg_not_present:1;
unsigned int useable:1;
};
"

The declaration in asm/ldt.h actually defines base_addr as unsigned int.

So my comment about 'some 64-bit systems' is wrong and should actually
say 'all systems'. Will fix.

That by itself is not an issue as long as the main executable is not loaded at
a high address. I only ran into problems with that on some ubuntu system
in our test pool.

Markus.


Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928