Re: [PATCH v2] x86/lib: Do not use local symbols with SYM_CODE_START_LOCAL()

From: Nadav Amit
Date: Fri May 26 2023 - 17:58:05 EST




> On May 26, 2023, at 2:17 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 5/26/23 14:10, Nadav Amit wrote:
>>>> I did not ask to make them global. Just to keep them as local after
>>>> linkage in the executable, like all other functions in the kernel.
>>> Ok, not global. But local and present in the symbol table:
>>>
>>> 105185: ffffffff81b89330 17 NOTYPE LOCAL DEFAULT 1 bad_get_user_clac
>>>
>>> And again, this helps how exactly?
>> Allowing debuggers, tracers, disassemblers and instrumentation tools to
>> work the same way they work as they work with any other piece of code in
>> the kernel.
>>
>> I personally work on code instrumentation and this makes my life hard for
>> no good reason.
>>
>> [ Perhaps the question should go the other way around: why addresses of
>> code in these functions should not be mapped to any symbol? ]
>
> Nadav, is there a chance you could give us a real-life example of how
> this affects you as an end user? What's a specific tool that you were
> using or a specific problem that you were trying to solve where these
> local symbols caused a problem? How would the global symbol have helped?
>
> I can certainly _imagine_ some, but I'm curious what you saw that
> prompted you to send this patch.

So my tool takes a branch trace and then simulates the code execution.
As a preparatory step I need to disassemble the code, yet as I do not
know where the symbol starts and its size, I can only disassemble one
instruction at a time. [ I prefer to disassemble the whole symbol at once
not just for performance, but also to figure out if it includes some
instructions that my simulator does not know to simulate correctly. ]

In addition, as I read the code from kcore and the binary keeps changing,
I want to assume that if I do not find an address in the symbol table [*]
then it means this is some dynamically generated code that is no longer
available through kcore (eBPF, ftrace, etc.).

These are only 2 things that break to one extent or another. I can
have workarounds for them (I already do). I just see no reason to
treat these two symbols differently.

I would also note that I can think of many many additional reasons to
have each piece of code mapped back to a symbol (besides debuggers,
tracers, etc.) For instance, security monitoring tools should prefer to
be able to check what code is running in the kernel.

I seriously see no downside here and only benefit in consistency and
usability. I have no hidden agenda if for some reason you suspect that
I do. I don’t want to start talking too much about the tool I work on,
as I am afraid it is off-topic, but I hope to open source it soon.

--

[*] I know kallsyms does not give sizes, but I make some reasonable
assumptions and augment kallsyms with the symbols from the binary.