Re: [PATCH v2] x86/lib: Do not use local symbols with SYM_CODE_START_LOCAL()

From: Nadav Amit
Date: Sat May 27 2023 - 05:18:07 EST

> On May 27, 2023, at 12:23 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Fri, May 26, 2023 at 02:55:21PM -0700, Nadav Amit wrote:
>> So my tool takes a branch trace and then simulates the code execution.
>> As a preparatory step I need to disassemble the code, yet as I do not
>> know where the symbol starts and its size, I can only disassemble one
>> instruction at a time. [ I prefer to disassemble the whole symbol at once
> So in this particular case, the exception handling ends up being part of
> __get_user_nocheck_8, see the end of this mail.

That’s not according to the symbol table - that’s in your mind.

Anyhow, the argument that __get_user_nocheck_8 and bad_get_user_clac are
related makes no sense even conceptually.

__get_user_nocheck_8 jumps in the case of an exception to bad_get_user_8_clac,
not to bad_get_user_clac. So even conceptually this notion that these two
symbols are connected makes no sense.

> However, the symbol table says it is of size 19:
> 123630: ffffffff81b89310 19 FUNC GLOBAL DEFAULT 1 __get_user_nocheck_8
> which means the trailing exception handling is not part of that symbol
> when looking at the size. And that's where your tool fails.
> Close?

Some people would even say “elementary”. I was sure it was already clear.

> And if so, your tool could simply look at the next symbol's address and
> attribute the trailing bytes to the previous symbol.
> If you look at the disassembly at the end, some other option has added
> INT3 padding to the end of the symbol so that the next one is aligned.
> But you can simply skip over those 0xcc insn bytes.
> And skipping over those 0xcc bytes is something your tool needs to
> handle anyway because they're not part of the symbol either.

I appreciate your help, but I have reasonable workarounds for my use-case
(and for the record, no, I don’t think that this solution that you
propose is reasonable).

>> These are only 2 things that break to one extent or another. I can
>> have workarounds for them (I already do). I just see no reason to
>> treat these two symbols differently.
> Right, the kernel should not dance just because some tool says so. And
> every time a new tool pops up, there are patches to "fix" the kernel.

It is not “a new tool". You screw up every tool that tries to understand
the context of an instruction pointer - gdb (that people use to debug
VMs) and probably perf, crash, drgn and many others.

>> I seriously see no downside
> The downside is polluting the symbol table unnecessarily. If it doesn't
> have to be there then it shouldn't be.

That’s a tautology, not a downside. And it is not “unnecessarily” if it
helps debugging and profiling.

> And yeah yeah, this particular case can be fixed easily but the bigger
> issue remains: we have a lot of local symbols which get discarded around
> the tree. Does that mean that because your tool cannot handle that, we
> have to stop using local symbols?

All the other local symbols are irrelevant to the discussion as they fall
within some other symbol's range.

> What happens if we do something else weird in the future and your tool
> breaks again?
> Also, why can't your tool handle such cases gracefully instead of having
> to "fix" the kernel each time?

As I stated multiple times (which are even quoted in this email): I
have a workaround.

You are not (not) helping me. I am trying to help you (and other users).
The kernel right now messes up with people's debugging and profiling

So allow me to rehash, since I thought that we have already agreed on
the details of the problem, but I see again that various arguments are
although they are either incorrect or not relevant for this discussion:

1. It is *not* about global symbols. It is just about exposing symbols.

2. It is *not* about symbols that fall within other symbols. Therefore,
all the other local symbols you grep’d are irrelevant for this

3. There are exactly 2 symbols we discuss (SYM_CODE_START_LOCAL + .L).

4. These 2 cases are the only ones that I know of in which code address
does not fall into any symbol.

5. It is not about “my tool”. It is about gdb (think about people debug
their VMs), perf and most likely crash, drgn, and others.

Now, as for your question “what happens if we do something else weird”:

If a tool relies on internal kernel data structures, it’s its own problem.
But if you decide to do “something else weird” with standard executable
data structures - such as DWARF or symbol table - you mess up with
debuggers and profilers.

So just don’t do such weird things.